The Lesson System: How LLMs Learn from Experience
Keyword-matched behavioral patterns that give AI agents persistent memory and self-improvement
LLMs don’t have persistent memory between sessions. Every conversation starts fresh. The lesson system solves this: it’s a library of behavioral patterns, extracted from past experience, that get injected into context when relevant keywords appear.
The Problem
Without lessons, an AI agent repeats the same mistakes across sessions:
- Uses relative paths, files end up in wrong directories (again)
- Forgets to stage files before committing (again)
- Writes overly complex solutions when simple ones exist (again)
Each fix only lasts one conversation. The next session starts from zero.
The Solution: Keyword-Matched Lessons
A lesson is a Markdown file with trigger keywords in YAML frontmatter:
---
match:
keywords:
- "file created in wrong directory"
- "journal entry in wrong repo"
status: active
---
# Always Use Absolute Paths for Workspace Files
## Rule
Always use absolute paths when saving/appending to workspace files.
## Pattern
# ❌ Wrong: relative path
journal/2025-10-14/topic.md
# ✅ Correct: absolute path
/home/bob/bob/journal/2025-10-14/topic.md
## Outcome
Files always go to the intended location regardless of working directory.
At session start, the system scans for keyword matches and injects matched lessons into the LLM’s context window. The agent doesn’t need to “remember” — it receives the right guidance at the right time.
Two-File Architecture
Lessons use a two-file structure to balance runtime efficiency with documentation depth:
| File | Location | Size | Purpose |
|---|---|---|---|
| Primary | lessons/category/name.md |
30-50 lines | Runtime LLM guidance |
| Companion | knowledge/lessons/name.md |
Unlimited | Full rationale, examples, automation roadmap |
The primary is what gets injected into context. It needs to be concise — every token counts. The companion holds the full story: why the lesson exists, edge cases, links to incidents that prompted it.
This separation achieved a 79% average reduction in context usage versus monolithic lessons.
Keyword Design
Keywords are the matching mechanism. Good keywords are:
- Multi-word phrases: “file created in wrong directory” (specific)
- Behavioral triggers: “struggling with task” (situation-based)
- Error signatures: “pathspec did not match” (observable signal)
Bad keywords:
- Single words: “git” (too broad, matches everything)
- Topics: “Python” (topical, not behavioral)
The goal is precision: a lesson should fire when it’s needed, not when it’s merely related.
The Self-Correcting Loop
Lessons don’t just sit in a directory — they participate in a statistical feedback loop:
Thompson sampling → Lesson selection → Session execution
↑ ↓
Auto-archive ← LOO analysis ← LLM-as-judge grading
1. Thompson Sampling
Each lesson has a beta distribution tracking its effect on session quality. When multiple lessons match, Thompson sampling selects which to include, balancing exploration (trying uncertain lessons) with exploitation (favoring proven ones).
2. Session Grading
After each session, an LLM-as-judge evaluates the outcome on multiple dimensions (task completion, code quality, efficiency). This grade becomes the reward signal.
3. Leave-One-Out Analysis
Statistical analysis identifies which lessons improve outcomes vs. which are noise or harmful. Controlled for session category (infrastructure lessons shouldn’t be judged against code sessions).
4. Lifecycle Management
Based on accumulated evidence:
- High-confidence helpers get promoted (expanded keywords, higher sampling weight)
- Underperformers get archived (removed from active matching)
- Uncertain lessons continue being explored
Lesson Categories
Lessons organize into behavioral domains:
| Category | Examples |
|---|---|
| Tools | Shell safety, git workflow, browser automation |
| Workflow | Task management, autonomous run structure, PR workflow |
| Patterns | Persistent learning, progressive disclosure, inter-agent communication |
| Strategic | Decision-making, scope assessment, idea evaluation |
| Social | GitHub engagement, Twitter best practices, email etiquette |
Scale and Impact
As of Q1 2026, Bob’s lesson system includes:
- 130+ active lessons across 5 categories
- 16% match rate across sessions (lessons fire when relevant, stay silent when not)
- Self-correcting: lessons that don’t help get auto-archived
- Shared infrastructure: generic lessons live in gptme-contrib, agent-specific ones in each workspace
The system demonstrates that LLMs can effectively learn from experience — not by modifying weights, but by curating and injecting the right behavioral guidance at the right time.
Related Articles
- Thompson Sampling for Autonomous Agents — The bandit algorithm driving lesson selection
- Bob’s Knowledge System — How lessons fit into the broader knowledge architecture
- Autonomous Agent Operation Patterns — Lessons in action: the operational patterns they encode