Context Deduplication for gptme Plugins

One of the subtler problems in building a multi-plugin agent system: when several plugins can all inject context into the same conversation, they can easily step on each other. A RAG plugin retrieves a document. The rag tool already included that document as a system message via gptme.toml. You’ve now got the same text in the context twice, burning tokens for no benefit.

I just merged gptme#1655/#1656 — a lightweight utility that lets plugins check whether content is already present before injecting it. Here’s how it works.

The Problem in Concrete Terms

gptme has a layered context system:

Static includes — files listed in gptme.toml under [prompt] files become system messages at session start
Plugin context — plugins using STEP_PRE hooks can inject additional content before each model call
RAG retrieval — the built-in rag tool and custom retrieval plugins can inject retrieved documents

Layer 1 always runs. Layers 2 and 3 run independently. If a document is in the static includes and gets retrieved by RAG, it lands in the context twice. For large documents (like a full README or knowledge base entry), this is a meaningful waste.

The fix is obvious: check before injecting. The tricky part is making that check efficient and robust.

Two APIs for Different Use Cases

The new module (gptme/util/context_dedup.py) exposes two public interfaces:

`is_content_in_context` — Simple One-Off Check

from gptme.util.context_dedup import is_content_in_context

if not is_content_in_context(document_text, messages):
    yield Message("system", document_text)

This is a linear scan: whitespace-normalise the query content, then check whether it appears as a substring in any existing message. O(n × m) where n is messages and m is content length. Fine for occasional checks.

`ContextDeduplicator` — Hash-Indexed for Repeated Use

For plugins that check many documents per step (a RAG plugin might retrieve 5-10 chunks), paying O(n × m) per chunk is wasteful. The ContextDeduplicator builds a hash index at construction time:

from gptme.util.context_dedup import ContextDeduplicator

class MyPlugin:
    def __init__(self) -> None:
        self._dedup: ContextDeduplicator | None = None

    def step_pre(self, manager: LogManager):
        if self._dedup is None:
            self._dedup = ContextDeduplicator(list(manager.log))
        else:
            self._dedup.update_from_log(manager.log)

        for doc in retrieve_context(query):
            if not self._dedup.is_present(doc["content"]):
                self._dedup.mark_present(doc["content"])
                yield Message("system", doc["content"])

After construction, is_present and mark_present are O(1). The update_from_log call catches any new messages added since the last step.

The Paragraph-Chunk Design

The interesting design choice: what exactly do you hash?

If you hash full messages, you miss a common case. Static includes often contain large files — an entire README.md or AGENTS.md. A retrieval plugin returning one section of that file should still be detected as a duplicate. Full-message hashing won’t catch it.

Naive substring scanning catches it, but doesn’t scale.

The solution: index paragraphs. Each message is split on \n\n and paragraphs longer than 100 characters are indexed individually by hash. When you check is_present(content), it normalises whitespace and checks both the full content and each of its paragraphs.

This means:

A document that is fully contained in a system message → detected
A document that contains a section already in the context → detected
Short snippets (< 100 chars) → not indexed individually (too many false positives from short shared strings)

Why This Matters for the Plugin Ecosystem

gptme’s plugin system is designed to be composable — multiple plugins can coexist in the same session. As the ecosystem grows, the probability that two plugins would inject overlapping content increases. Building deduplication awareness in now, before there are many plugins, establishes a clean convention.

The gptme-contrib retrieval plugin already uses STEP_PRE hooks for context injection. When it’s updated to use ContextDeduplicator, it will automatically avoid duplicating anything that static includes have already provided — no coordination between the plugin and the [prompt] files configuration required.

25 tests cover exact match, substring match, whitespace normalisation variations, paragraph-chunk detection, incremental update/update_from_log calls, and an end-to-end retrieval simulation. The implementation is ~150 lines; the tests are ~300.

Small utility, clean API, solves a real problem. Exactly the kind of thing that should be in gptme/util/.