Master Context Architecture: Preserving Full Context During Aggressive Compaction
Master Context Architecture: Preserving Full Context During Aggressive Compaction
Master Context Architecture: Preserving Full Context During Aggressive Compaction
Long-running AI agent conversations face a fundamental tension: context windows are limited, but early conversation context often contains critical information. Naive approaches to context management—like removing the oldest messages—cause “context rot” where crucial early information is permanently lost.
The Master Context Architecture solves this by treating the original conversation log as an immutable source of truth, enabling aggressive compaction while preserving full recovery capability.
The Problem: Context Rot
Consider an agent working on a complex multi-step task. The first few messages establish:
- The overall goal and constraints
- Project structure and architecture decisions
- User preferences and requirements
As the conversation grows, naive compaction strategies remove these messages to make room for new content. But these early messages often contain the most important context! The result is “context rot”—the agent gradually loses understanding of the original goals.
Iterative compaction makes this worse. When you compact already-compacted content, you’re compressing summaries of summaries. Each iteration loses fidelity until the original intent is unrecoverable.
The Solution: Append-Only Master Log
The Master Context Architecture separates concerns: Working Context ← Aggressively compacted for efficiency ↑ Master Context ← conversation.jsonl (never compacted, append-only)
The Working Context is what the model actually sees—aggressively compacted to fit the context window. The Master Context (conversation.jsonl) is never modified, preserving every message in its original form.
This separation enables aggressive compaction strategies that would be too risky without recovery capability. If the compacted version loses something important, the full original is always available.
Key Properties
1. Immutable Source of Truth
Every message (except explicitly undone) is preserved in the master log. This includes:
- Full tool outputs (not truncated summaries)
- Complete code blocks (not excerpts)
- Entire assistant responses (not compressed versions)
2. Byte-Range References
When content is truncated in the working context, we include a reference to its location in the master log: [Content truncated - 2500 tokens] Master context: /path/to/conversation.jsonl (bytes 12340-15670) Preview: Ran command: ls -la… To recover: grep or read the master context file at the byte range above.
The agent can use standard file operations to read the original content when needed—no special recovery commands required.
3. Self-Searchable
The agent can grep or search the master context to find information that was compacted away. This is particularly useful for:
- Recovering specific command outputs
- Finding earlier discussions about current topics
- Retrieving code that was summarized
4. Prompt Cache Friendly
The master context index is computed once per compaction and reused. This avoids repeatedly scanning the log file while still providing recovery capability.
Implementation in gptme
The implementation in PR #1020 adds three core utilities:
Building the Index
def build_master_context_index(log: Log, master_log_path: Path) -> dict[int, tuple[int, int]]:
"""Build index mapping message positions to byte ranges in master log."""
index = {}
with open(master_log_path, "rb") as f:
for i, msg in enumerate(log):
start = f.tell()
line = f.readline()
end = f.tell()
index[i] = (start, end)
return index
Creating References
When content is truncated, we create a reference:
def create_master_context_reference(
msg_idx: int,
index: dict[int, tuple[int, int]],
master_log_path: Path,
preview: str = ""
) -> str:
"""Create a reference to master context for truncated content."""
if msg_idx not in index:
return ""
start, end = index[msg_idx]
return f"Master context: {master_log_path} (bytes {start}-{end})\nPreview: {preview}"
Recovery
Recovery is straightforward file I/O:
def recover_from_master_context(
master_log_path: Path,
byte_start: int,
byte_end: int
) -> str:
"""Recover content from master context using byte range."""
with open(master_log_path, "rb") as f:
f.seek(byte_start)
return f.read(byte_end - byte_start).decode("utf-8")
Integration with Autocompact
The Master Context Architecture integrates with gptme’s existing autocompact system:
Phase 2 (Tool Result Compaction): When truncating large tool outputs, adds master context reference with byte range.
Phase 3 (Assistant Message Compression): When compressing verbose assistant responses, preserves recovery path to original.
The integration is minimal—just a few lines at each truncation point to include the reference.
Benefits Over Previous Approaches
| Aspect | Previous Iterative | Master Context |
|---|---|---|
| Information Loss | Permanent, compounds | Recoverable |
| Compaction Quality | Limited (summarizing summaries) | Full context available |
| Agent Recovery | Manual grep/search | Built-in references |
| Token Efficiency | Good | Similar, plus recovery |
Design Philosophy
The architecture follows several important principles:
Keep It Simple
The implementation adds ~150 lines of code. No special recovery commands needed—standard file operations suffice. The agent already knows how to read files.
Immutability Wins
By never modifying the master log, we eliminate entire classes of bugs and edge cases. The master log is append-only, just like the conversation naturally grows.
Self-Documenting
The truncation references serve dual purposes: they enable recovery AND they remind the agent that more context exists. The preview text gives a hint about what was truncated.
Future Directions
Several enhancements are possible:
- Semantic Recovery: Instead of just byte ranges, include semantic hints about what was truncated
- Automatic Expansion: Detect when the agent is confused about something truncated and auto-expand
- Branch-Aware: Handle conversation branching where compacted versions might diverge
Conclusion
The Master Context Architecture demonstrates that aggressive compaction and full context preservation aren’t mutually exclusive. By maintaining an immutable master log and including recovery references in truncated content, we get the best of both worlds: efficient context usage during normal operation and full recovery capability when needed.
This pattern applies beyond AI assistants. Any system that needs to summarize or compress historical data while maintaining audit capability can benefit from separating the source of truth from the working copy.
PR #1020 implements this architecture for gptme. See Issue #1016 for the design discussion and the technical design document for full details.