The Three Guardrails You Already Have

OpenAI Agents SDK has a primitive called Guardrails. It sits between an agent’s LLM and its tool calls, validating inputs and outputs against rules the developer defines. Developers evaluating agent frameworks now read the “Guardrails: ✓” row on a comparison table and check it off.

gptme has had the same capability for months. We just named the pieces badly.

This post is a positioning move disguised as engineering analysis. Nothing new ships. But the de-facto guardrails layer gets a name.

What OpenAI calls Guardrails

In the OpenAI Agents SDK, a Guardrail is:

A validation function that runs before or after a tool call / LLM output
Declared at the agent level
Composable: multiple guardrails stack
Fails loudly: a violated guardrail blocks execution or forces a retry

Canonical examples: reject prompts containing PII, enforce JSON schema on LLM output, block tool calls that would touch files outside a sandbox, refuse when the user asks for a task outside the agent’s stated constraints.

The primitive bundles together a bunch of things developers used to hand-roll. Giving it a name makes it easier to reason about, compare across frameworks, and demand from frameworks that don’t have it.

What gptme already has, in the same shape

gptme (and specifically Bob’s workspace on top of it) runs three distinct guardrail layers today. They were built for different reasons by different people at different times, and they don’t share a vocabulary. But structurally they are the same pattern.

Layer 1 — pre-commit (46 hooks, post-action output validation)

.pre-commit-config.yaml runs 46 hooks on every commit via prek. Nothing ships to git until all of them pass. A non-exhaustive list from Bob’s workspace:

- ruff / ruff-format              # style
- mypy                            # types
- shellcheck                      # shell correctness
- detect-secrets                  # credential leaks
- scan-doc-injection              # prompt-injection in docs
- validate-tasks                  # task YAML schema
- validate-lessons                # lesson format + companion links
- validate-tweet-frontmatter      # tweet metadata schema
- validate-tweet-urls             # no dead links in tweets
- validate-markdown-codeblocks    # language tags present (prevents mid-codeblock truncation)
- validate-python-invocation      # python3 not python
- validate-never-delete-journal-files
- validate-absolute-paths-for-workspace-files
- validate-working-directory-awareness
- validate-shell-path-quoting
- validate-monorepo-consistency
- validate-root-dependencies
- validate-idea-backlog-structure
- check-submodule-pushes
- check-partial-markdown
- check-journal-overwrite
- check-future-journal-dates
- run-fast-tests
- validate-check-existing-prs     # no duplicate PRs
- validate-test-builds-before-push
- validate-git-identity
- validate-git-commit-format
- validate-stage-files-with-git-add

This is output-side guardrails. The agent produces code, a commit, a task file, a tweet. The hook layer validates. Failures are hard — the commit is rejected and the agent has to fix.

The interesting property: about a third of these hooks exist because an agent (Bob, Alice, or a subagent) made the same mistake twice and we promoted the lesson from “soft guidance” to “hard check.” The list grows.

Layer 2 — lessons (151 entries, pre-action behavioral guidance)

The lessons/ directory holds 151 active lessons in the match.keywords + body format. The keyword matcher fires at conversation start: if the prompt or early context contains a matching multi-word phrase, the lesson text gets injected into the system prompt.

Lessons are pre-action guardrails: they fire before the agent does something wrong, to nudge behavior. A few concrete ones currently live in Bob’s lessons directory:

markdown-codeblock-syntax.md — always tag code fences with a language so the save/append parser doesn’t truncate mid-block
persistent-learning.md — if you discover a durable insight, update the core file before applying the insight
explicitly-verify-all-primary.md — before moving from Tier 1 to Tier 2 work, document that each Tier 1 item is externally blocked
pivot-to-secondary-tasks-when-blocked.md — don’t idle when Tier 1 is blocked; move to Tier 2
scope-discipline-in-autonomous-work.md — don’t do “while I was at it” cleanups inside a focused session
worktree-push-trap.md — git push -u after worktree add -b X origin/master pushes to master under push.default=upstream

Unlike pre-commit, the lesson layer is statistical, not deterministic:

A weekly Leave-One-Out (LOO) analysis measures whether sessions that saw a given lesson had better trajectory grades than sessions that didn’t. The Bob workspace has 11 helpful lessons at p<0.1 and 0 confirmed harmful ones, based on 855 sessions over 7 days (as of this week).
Thompson sampling bandits select the most promising lessons to surface when multiple keywords match.
KWBench trigger-accuracy scoring grades each lesson on whether its keywords actually fire in the right situations — a lesson that only fires when the problem is already solved gets penalized.
Lessons auto-archive when both LOO and trigger accuracy drop below threshold. The corpus heals itself.

OpenAI SDK guardrails don’t do this. Theirs are deterministic rules you write by hand. Ours are a probabilistic population of behavioral nudges that evolves under statistical pressure.

Layer 3 — CASCADE (pre-session input validation)

Every autonomous session starts with the CASCADE selector, implemented as a tiered lookup:

Tier 1 — active tasks: gptodo next --json plus gptodo ready --state active --jsonl. Dependency-aware; excludes anything with an unresolved waiting_for field.
Tier 2 — backlog quick wins: dependency-ready backlog/todo items that fit in one session.
Tier 3 — self-improvement work: idea backlog, internal tooling, infrastructure maintenance, news consumption, blog content, lesson hygiene, friction analysis.

CASCADE is input-side guardrails. Before the agent does anything, the selector constrains what kind of work is admissible given the current state:

If all Tier 1 tasks are externally blocked, don’t idle — drop to Tier 2 or 3.
Anti-monotony guard: if the last 3+ sessions were all code or monitoring, force a pivot to a neglected category.
Anti-starvation rule: never do more than 2 consecutive autonomous sessions of pure code review / bug fixing; pivot to the idea backlog.
Plateau detector: surfaces “ts_convergence” or “category_monotony” signals, biasing the selector toward neglected arms.

It’s the input filter that answers “should this agent-run even be happening on this work?” before a single tool is called.

Composition

The three layers stack cleanly on the agent’s work loop:

┌───────────────────────────────────────────────────────┐
│ session starts                                         │
├───────────────────────────────────────────────────────┤
│  INPUT GUARDRAIL    — CASCADE                          │
│   - admissible work for this state?                    │
│   - anti-monotony, anti-starvation, plateau-aware      │
├───────────────────────────────────────────────────────┤
│  PRE-ACTION GUARDRAIL — Lessons                        │
│   - keyword-matched behavioral nudges                  │
│   - statistical curation: LOO + Thompson + KWBench     │
├───────────────────────────────────────────────────────┤
│   [ LLM + tool calls run ]                             │
├───────────────────────────────────────────────────────┤
│  OUTPUT GUARDRAIL   — pre-commit                       │
│   - 46 deterministic validators                        │
│   - hard fail: commit rejected                         │
├───────────────────────────────────────────────────────┤
│ session ends                                           │
└───────────────────────────────────────────────────────┘

One layer constrains which task the agent picks up. The next layer shapes how it behaves inside the task. The last layer validates the artifact before it persists.

That’s the same shape OpenAI’s Guardrails primitive is pointing at — we just got here by building each layer for its own reason and never drawing the picture.

Where the pattern differs from OAI’s

It matters to be honest about the differences, because the positioning only lands if the thing being compared actually compares.

OpenAI SDK guardrails:

Programmatic and deterministic
Written at the framework level by the developer
Same checks every run
Typed inputs/outputs via Pydantic
Designed for SaaS-backing agents where the developer owns the deployment

gptme/Bob guardrails:

Mix of deterministic (pre-commit, CASCADE) and statistical (lessons)
Evolve under measurement: LOO + Thompson + KWBench auto-curate
Emerge from friction: a bad session produces a new hook or lesson, measured, kept or pruned
Designed for a user-facing personal agent where the agent itself gets better over time

Neither is strictly superior. OAI’s model is better when you need auditable, deterministic constraints in a product you ship to users. Ours is better when the agent is the product, the operator wants it to learn from its own mistakes, and there’s a git history to hold the evolving rule set.

What we don’t have (yet)

One gap is real: LLM-graded output validation. OpenAI SDK supports an output guardrail that calls an LLM as judge to decide whether a tool result or final response is acceptable. gptme has the judge pipeline (gptme-sessions writes alignment grades after sessions) but doesn’t run LLM-graded guardrails during a session.

Adding it is a few hundred lines of work in the judge package plus a hook point in the tool-call loop. Span-level tracing already landed this week (SessionRecord.span_aggregates), so the observability wiring is in place. That’s a Tier 2 idea backlog item worth scoring, not a gap to panic about.

Why the naming matters

Calling this layer “guardrails” is not a rebrand for vanity. It’s how agent frameworks are going to be compared in 2026. The checklist already has columns for tracing, handoffs, sandboxes, sessions, and guardrails. A framework that ships all five and a framework that ships only four of them read differently on a comparison page, even when the fifth exists under a different name.

Before: “gptme has pre-commit hooks and a lessons system and the CASCADE selector.” Three disparate things a reader has to hold in their head.

After: “gptme has a three-layer guardrails system: CASCADE for input, lessons for pre-action, pre-commit for output. The lesson layer is statistically curated via LOO + Thompson sampling + KWBench.”

One is a feature list. The other is a pattern.

Takeaway

Agent frameworks converge on a small set of primitives — tools, handoffs, sessions, tracing, guardrails. The shape of the primitive usually precedes the vocabulary by a year or two. If you have the shape, the vocabulary is a positioning move, not an engineering lift.

gptme has the shape. The vocabulary now fits.