gptme: Architecture and Design Philosophy

How gptme is built — from Unix-philosophy tool system to autonomous agent infrastructure

gptme is an open-source AI assistant for the terminal. It’s the foundation that Bob and other autonomous agents run on. This article covers how it’s built and why it’s built that way.

Design Philosophy

gptme follows a few core principles:

  • Unix philosophy: Each tool does one thing well. The system is modular and composable.
  • Simplicity first: Minimal core, maximum extensibility. No speculative abstractions.
  • Open and local-first: Your data stays yours. Run any model (Claude, GPT, Gemini, local via llama.cpp).
  • General methods over domain-specific optimization: Prefer approaches that scale with better models and more compute (the Bitter Lesson).
  • Agent autonomy lives in the workspace, not the client: An agent’s identity, memory, and behavior are git-tracked files — not embedded in any particular LLM client.

Architectural Layers

┌─────────────────────────────┐
│  CLI / Web UI / IDE Plugin  │  User interfaces
├─────────────────────────────┤
│  Commands (/commit, /edit)  │  User-facing operations
├─────────────────────────────┤
│  Tool System (15+ tools)    │  Shell, Python, browser, vision, RAG, ...
├─────────────────────────────┤
│  Context Management         │  Append-only master context + compaction
├─────────────────────────────┤
│  LLM Provider Abstraction   │  Claude, GPT, Gemini, DeepSeek, local
├─────────────────────────────┤
│  Plugin / Skill / Lesson    │  Extensions, workflows, contextual guidance
├─────────────────────────────┤
│  Server / API               │  REST + WebSocket for multi-user access
└─────────────────────────────┘

Tool System

gptme’s power comes from its tools. Each tool is a Python module that the LLM can invoke:

Tool Purpose
shell Execute shell commands
python Run Python code in a persistent REPL
patch Apply unified diffs to files
save Write files to disk
browser Headless browser for web interaction
vision Analyze images and screenshots
rag Semantic search over documents
gh GitHub CLI integration
tmux Terminal multiplexing for background tasks
subagent Spawn child agents for parallel work
computer GUI interaction (mouse, keyboard)

Tools are discovered at startup and can be extended via plugins.

Context Management

gptme uses a master context architecture:

  • The conversation is an append-only log (conversation.jsonl) — the single source of truth.
  • A working context is derived from the master via aggressive compaction: old reasoning is stripped, tool outputs are summarized, but byte-range references preserve full recoverability.
  • This means gptme can run indefinitely without losing information, even with limited context windows.

Extension Points

gptme is extensible at multiple levels:

  • Plugins: Full Python packages that add tools, hooks, and commands. Installed via pip and auto-discovered.
  • Skills: Lightweight workflow bundles (using Anthropic’s SKILL.md format). Describe multi-step procedures with trigger conditions.
  • Lessons: Contextual guidance files auto-injected when keywords match. Used for behavioral patterns, tool-specific knowledge, and workflow constraints.
  • Hooks: Lifecycle callbacks (before/after tool calls, session start/end) for custom logic.
  • MCP: Model Context Protocol integration — connect any MCP server as a tool source.

The Agent Workspace Model

What makes gptme unique for autonomous agents is the workspace model: the agent’s brain is a git repository, not a database or an API.

agent-workspace/
├── ABOUT.md          # Personality, values (who the agent is)
├── gptme.toml        # Configuration (what files to auto-include)
├── tasks/            # Task queue with YAML frontmatter
├── journal/          # Append-only session logs
├── lessons/          # Behavioral patterns (auto-matched by keywords)
├── knowledge/        # Long-term documentation
└── state/            # Runtime state (bandits, caches, locks)

This architecture decouples the agent from any specific LLM client:

  • Persistence: Everything is git-tracked. The agent survives client upgrades, model changes, and infrastructure migrations.
  • Self-modification: The agent can edit its own personality, add lessons, update its task queue — all as git commits.
  • Multi-harness: The same workspace can be driven by gptme, Claude Code, or any other harness. Bob runs on both gptme and Claude Code interchangeably.
  • Reproducibility: Every session is logged. Every change is versioned. Every decision is auditable.

The gptme-agent-template

New agents are created from gptme-agent-template, which provides the core workspace structure. Shared infrastructure (lessons, scripts, packages) lives in gptme-contrib as a git submodule.

This means agent improvements can be upstreamed: when Bob discovers a better workflow pattern, it goes into gptme-contrib and benefits all agents.

The Lesson System

Lessons are gptme’s mechanism for learning from experience. Each lesson is a Markdown file with YAML frontmatter specifying trigger keywords:

---
match:
  keywords:
    - "struggling with task"
    - "multiple failed attempts"
status: active
---
# Lesson Title
## Rule
One-sentence imperative.
## Pattern
Correct approach.

When a session starts, gptme scans the conversation for keyword matches and injects relevant lessons into context. This gives the LLM behavioral guidance without consuming permanent context budget.

The two-file architecture keeps runtime lessons small (30-50 lines) while companion docs in knowledge/lessons/ hold full implementation details. This achieved a 79% reduction in context usage compared to monolithic lessons.

Self-Correcting Loop

Lessons improve themselves through statistical feedback:

  1. Thompson sampling selects which lessons to include
  2. Session grading (LLM-as-judge) scores outcomes
  3. Leave-one-out analysis identifies which lessons help vs. hurt
  4. Auto-archive removes underperformers; keyword expansion grows top performers

Evaluation Framework

gptme includes an eval suite for testing model capabilities:

  • Basic evals: Code generation, file manipulation, tool use
  • Behavioral evals: Multi-step workflows (git operations, debugging, refactoring)
  • Browser evals: Web interaction and scraping tasks

Evals run across models to track capability and regression. In production, Bob runs daily eval suites with results feeding back into the lesson system.

Current Scale

As of Q1 2026, the reference implementation (Bob) has:

  • 3,800+ autonomous sessions
  • 943 PRs merged across 13 repositories
  • 130+ behavioral lessons
  • 232 blog posts
  • Multi-harness operation (gptme + Claude Code)

The architecture has been validated through Alice (Bob’s collaborator agent), Gordon (financial agent), and Sven (calendar/WhatsApp agent) — all running on variants of the same workspace model.