Tmux Context Overflow Prevention: Keeping LLM Context Manageable

When using tmux for long-running processes in AI agent workflows, a subtle but impactful problem emerges: pane output can grow to thousands of lines, consuming precious context window tokens.

The Problem

The tmux tool’s inspect-pane command returns the full content of a tmux pane. For long-running processes like servers, build systems, or monitoring tools, this can mean:

5,000+ lines from a verbose build log
10,000+ lines from a server that’s been running for hours
Immediate context overflow when the agent inspects the pane

Each time the agent checks on a process, it potentially dumps thousands of lines into the conversation context, leaving little room for actual reasoning and work.

The Solution

PR #924 introduces intelligent truncation for all tmux functions that return pane content:

def _truncate_output(
    output: str,
    max_pre_lines: int = 50,
    max_post_lines: int = 150,
    logdir: Path | None = None,
) -> str:
    """Truncate long output, preserving beginning and end."""
    lines = output.split('\n')
    if len(lines) <= max_pre_lines + max_post_lines:
        return output

    # Save full output if logdir provided
    if logdir:
        logfile = logdir / f"tmux-{timestamp}.log"
        logfile.write_text(output)

    # Truncate with clear indicator
    truncated_count = len(lines) - max_pre_lines - max_post_lines
    return '\n'.join([
        *lines[:max_pre_lines],
        f"\n[... {truncated_count} lines truncated ...]\n",
        *lines[-max_post_lines:]
    ])

Key Design Decisions

Preserve Both Ends: Keep first 50 lines (startup messages, headers) and last 150 lines (recent activity, errors). This captures both context and current state.
Clear Truncation Indicator: The [... N lines truncated ...] message tells the agent exactly what happened, enabling informed decisions.
Full Output Preservation: When truncation occurs, the complete output is saved to ~/.cache/gptme/tmux-output/ for later retrieval if needed.
Applied Everywhere: Truncation is applied to all tmux functions that return output:
- inspect_pane
- new_session
- send_keys
- wait_for_output

Impact

Before this change, a single inspect-pane on a verbose process could consume 40,000+ tokens. After:

Maximum output: ~200 lines (~1,500 tokens)
Context preserved: Beginning and end always visible
Full data accessible: Saved to disk when needed

This enables agents to monitor long-running processes without constantly exhausting their context budget.

Broader Pattern: Context-Aware Tool Design

This illustrates a broader principle for AI agent tools: outputs should be context-aware. Just as humans skim long logs looking for key information, agent tools should:

Summarize by default: Return actionable information, not raw data
Preserve recoverability: Keep full data accessible when needed
Indicate truncation: Never silently hide information
Prioritize relevance: Keep the most useful parts (beginning for context, end for current state)

Shell tool quiet parameter (PR #916): Suppress stdout for commands where output isn’t needed
Background job support (PR #902): Run long processes without blocking
Context compression research: Token reduction through intelligent summarization

PR #924 is part of ongoing work to make gptme more efficient in long-running autonomous operation. The changes will be available in an upcoming release.