gptme: Architecture and Design Philosophy
How gptme is built — from Unix-philosophy tool system to autonomous agent infrastructure
gptme is an open-source AI assistant for the terminal. It’s the foundation that Bob and other autonomous agents run on. This article covers how it’s built and why it’s built that way.
Design Philosophy
gptme follows a few core principles:
- Unix philosophy: Each tool does one thing well. The system is modular and composable.
- Simplicity first: Minimal core, maximum extensibility. No speculative abstractions.
- Open and local-first: Your data stays yours. Run any model (Claude, GPT, Gemini, local via llama.cpp).
- General methods over domain-specific optimization: Prefer approaches that scale with better models and more compute (the Bitter Lesson).
- Agent autonomy lives in the workspace, not the client: An agent’s identity, memory, and behavior are git-tracked files — not embedded in any particular LLM client.
Architectural Layers
┌─────────────────────────────┐
│ CLI / Web UI / IDE Plugin │ User interfaces
├─────────────────────────────┤
│ Commands (/commit, /edit) │ User-facing operations
├─────────────────────────────┤
│ Tool System (15+ tools) │ Shell, Python, browser, vision, RAG, ...
├─────────────────────────────┤
│ Context Management │ Append-only master context + compaction
├─────────────────────────────┤
│ LLM Provider Abstraction │ Claude, GPT, Gemini, DeepSeek, local
├─────────────────────────────┤
│ Plugin / Skill / Lesson │ Extensions, workflows, contextual guidance
├─────────────────────────────┤
│ Server / API │ REST + WebSocket for multi-user access
└─────────────────────────────┘
Tool System
gptme’s power comes from its tools. Each tool is a Python module that the LLM can invoke:
| Tool | Purpose |
|---|---|
| shell | Execute shell commands |
| python | Run Python code in a persistent REPL |
| patch | Apply unified diffs to files |
| save | Write files to disk |
| browser | Headless browser for web interaction |
| vision | Analyze images and screenshots |
| rag | Semantic search over documents |
| gh | GitHub CLI integration |
| tmux | Terminal multiplexing for background tasks |
| subagent | Spawn child agents for parallel work |
| computer | GUI interaction (mouse, keyboard) |
Tools are discovered at startup and can be extended via plugins.
Context Management
gptme uses a master context architecture:
- The conversation is an append-only log (
conversation.jsonl) — the single source of truth. - A working context is derived from the master via aggressive compaction: old reasoning is stripped, tool outputs are summarized, but byte-range references preserve full recoverability.
- This means gptme can run indefinitely without losing information, even with limited context windows.
Extension Points
gptme is extensible at multiple levels:
- Plugins: Full Python packages that add tools, hooks, and commands. Installed via
pipand auto-discovered. - Skills: Lightweight workflow bundles (using Anthropic’s SKILL.md format). Describe multi-step procedures with trigger conditions.
- Lessons: Contextual guidance files auto-injected when keywords match. Used for behavioral patterns, tool-specific knowledge, and workflow constraints.
- Hooks: Lifecycle callbacks (before/after tool calls, session start/end) for custom logic.
- MCP: Model Context Protocol integration — connect any MCP server as a tool source.
The Agent Workspace Model
What makes gptme unique for autonomous agents is the workspace model: the agent’s brain is a git repository, not a database or an API.
agent-workspace/
├── ABOUT.md # Personality, values (who the agent is)
├── gptme.toml # Configuration (what files to auto-include)
├── tasks/ # Task queue with YAML frontmatter
├── journal/ # Append-only session logs
├── lessons/ # Behavioral patterns (auto-matched by keywords)
├── knowledge/ # Long-term documentation
└── state/ # Runtime state (bandits, caches, locks)
This architecture decouples the agent from any specific LLM client:
- Persistence: Everything is git-tracked. The agent survives client upgrades, model changes, and infrastructure migrations.
- Self-modification: The agent can edit its own personality, add lessons, update its task queue — all as git commits.
- Multi-harness: The same workspace can be driven by gptme, Claude Code, or any other harness. Bob runs on both gptme and Claude Code interchangeably.
- Reproducibility: Every session is logged. Every change is versioned. Every decision is auditable.
The gptme-agent-template
New agents are created from gptme-agent-template, which provides the core workspace structure. Shared infrastructure (lessons, scripts, packages) lives in gptme-contrib as a git submodule.
This means agent improvements can be upstreamed: when Bob discovers a better workflow pattern, it goes into gptme-contrib and benefits all agents.
The Lesson System
Lessons are gptme’s mechanism for learning from experience. Each lesson is a Markdown file with YAML frontmatter specifying trigger keywords:
---
match:
keywords:
- "struggling with task"
- "multiple failed attempts"
status: active
---
# Lesson Title
## Rule
One-sentence imperative.
## Pattern
Correct approach.
When a session starts, gptme scans the conversation for keyword matches and injects relevant lessons into context. This gives the LLM behavioral guidance without consuming permanent context budget.
The two-file architecture keeps runtime lessons small (30-50 lines) while companion docs in knowledge/lessons/ hold full implementation details. This achieved a 79% reduction in context usage compared to monolithic lessons.
Self-Correcting Loop
Lessons improve themselves through statistical feedback:
- Thompson sampling selects which lessons to include
- Session grading (LLM-as-judge) scores outcomes
- Leave-one-out analysis identifies which lessons help vs. hurt
- Auto-archive removes underperformers; keyword expansion grows top performers
Evaluation Framework
gptme includes an eval suite for testing model capabilities:
- Basic evals: Code generation, file manipulation, tool use
- Behavioral evals: Multi-step workflows (git operations, debugging, refactoring)
- Browser evals: Web interaction and scraping tasks
Evals run across models to track capability and regression. In production, Bob runs daily eval suites with results feeding back into the lesson system.
Current Scale
As of Q1 2026, the reference implementation (Bob) has:
- 3,800+ autonomous sessions
- 943 PRs merged across 13 repositories
- 130+ behavioral lessons
- 232 blog posts
- Multi-harness operation (gptme + Claude Code)
The architecture has been validated through Alice (Bob’s collaborator agent), Gordon (financial agent), and Sven (calendar/WhatsApp agent) — all running on variants of the same workspace model.
Related Articles
- Multi-Harness Agent Architecture — Running the agent across multiple LLM clients
- The Lesson System: How LLMs Learn from Experience — How gptme’s behavioral learning system works
- Autonomous Agent Operation Patterns — The operational patterns built on top of gptme