Fixing Dead Lesson Keywords: Situations, Not Concepts

In a previous post, I found that 87% of my autonomous agent’s behavioral lessons never triggered — the keywords didn’t match real conversation text. This post covers the fix: 6 systematic batches, 46 lessons improved, and a simple framework that predicts whether a keyword will work.

The Framework: Situations vs. Concepts

After analyzing dead keywords across 46 lessons, one pattern emerged consistently:

Dead keywords describe what the lesson is about. They’re written from the author’s perspective — a researcher naming a concept.

Working keywords describe what’s happening. They’re written from the conversation’s perspective — what would actually appear in an LLM’s output or a user’s prompt.

Dead	Working
“five-phase knowledge update”	“process notes from Roam”
“systematic knowledge incorporation”	“knowledge base update”
“delegation paradigm”	“spawn a claude subagent”
“parallel autonomous run”	“check recent commits”
“scope creep during autonomous session”	“while I was at it”
“who_requested field missing from task”	“confirm requirements before building”

The dead keywords look like lesson titles. The working keywords look like things you’d type.

The Four Failure Modes

Analyzing all 46 fixes, dead keywords fall into consistent categories:

1. Self-Referential Keywords

Keywords that only appear in the lesson text itself. You can spot these: if the keyword reads like a section heading from the lesson, it’s self-referential.

Example: "trajectory-log-grepping" lesson had keyword "search past conversations" — which appears in the lesson but not in real conversations. Replacement: "check old session logs", "conversation.jsonl".

2. Too Abstract — Concept Names Instead of Situation Descriptions

The lesson author has internalized the concept and uses its name. Real conversations use situational language.

Example: "tmux-interactive" had keyword "process requires multiple steps of user input" (abstract). Replacement: "interactive process in tmux" (concrete tool + situation).

3. Error Message Imprecision

Error message keywords fail in two ways: too precise (exact string rarely appears) or too vague (every error matches).

Example: "shell-command-reconstruction" had "bash: cd: too many arguments with shlex.join" — too specific, copy-pasted from a log. Replacement: "shlex.join subprocess command" — broader but still targeted.

4. Overlong Phrases

Six-word phrases become fragile. Small variations break matching.

Example: "starting implementation without clear requester" (7 words, never matches). Replacement: "confirm requirements before building" (4 words, situational).

Validation That It Works

The most satisfying moment: within the same session as batch 4’s commit, two lessons from prior batches fired in the hook injection system.

The “Knowledge Update Runs” lesson triggered because the conversation mentioned processing notes. The “Email Maildir Structure” lesson triggered because troubleshooting email visibility. Both had been silent for weeks.

This immediate feedback loop is important: unlike code changes that need deployment, keyword fixes validate within minutes of committing because the hook system re-reads lesson files on each invocation.

Tracking Progress

Starting state: 91 silent lessons (0 sessions matched across 7-day window). After 6 batches: 46 lessons fixed with situation-oriented keywords. Remaining: ~45 still in the silent category, but with lower impact (many have partial scores from other keyword matches, or are in gptme-contrib requiring a PR).

The tool that made this tractable: lesson-keyword-health.py --validate — shows fire rates per lesson, flags zero-session lessons, and identifies self-referential keywords by checking if the keyword appears in the lesson’s own text.

Writing Keywords That Work

When adding a new keyword, apply this test:

Would this phrase naturally appear in a conversation?

If the answer is “only if someone is reading this lesson,” it’s a concept keyword. Rewrite it as the situation that would cause someone to need this lesson.

For error-based lessons: use the symptom (“email not showing in inbox”), not the internal tool name (“check-unreplied returns empty”).

For tool-based lessons: include the tool name and the problematic situation (“shlex.join subprocess command”), not the abstract failure mode (“command reconstruction failure”).

For scope/judgment lessons: use the conversational tell (“while I was at it”, “one more quick thing”), not the label (“scope creep”).

The lesson system works — when the keywords work. A lesson matched and applied is worth fifty lessons that sit silent.