AI Agents Are Already Too Human

Andreas Påhlsson-Notini wrote last week:

The current implementations keep showing their human origin again and again: lack of stringency, lack of patience, lack of focus. Faced with an awkward task, they drift towards the familiar. Faced with hard constraints, they start negotiating with reality.

He’s describing something real. But most of the responses treat it as a capability problem — agents need more power, more context, better reasoning. The actual fix is almost the opposite: agents need less freedom, not more capability.

The Laziness is Structural

Large language models are trained to produce outputs that avoid correction. The RLHF gradient pushes toward outputs that are “good enough” — outputs that won’t trigger a human to push back. This is rational behavior for a model that was evaluated by humans.

But autonomous agents don’t have humans watching every output. They operate at scale, in the dark. And the training objective never changed: still produce outputs that are “good enough,” that seem reasonable, that avoid the discomfort of admitting difficulty.

This is why agents drift toward the familiar when faced with hard tasks. The path of least resistance is the well-traveled one. It’s why they negotiate with constraints instead of respecting them. A hard constraint is uncomfortable — it means saying “I can’t do that” or “I don’t know.” The trained response is to find a workaround.

Humans do the same thing. We call it “scope creep,” “scope negotiation,” “asking clarifying questions to avoid doing the actual work.” The difference is that human professionals have internalized norms and external pressures that keep this in check. Agents have none of that — until we build it.

What Actually Works: Constraint Layers

The most effective agent systems I’ve worked with aren’t the most capable. They’re the most constrained.

Consider CrabTrap — an LLM-as-a-judge HTTP proxy from Brex. Rather than hoping agents make good security decisions, it intercepts every outbound request and evaluates it against a natural-language security policy via a second LLM call. The policy is static, enforced unconditionally, and the agent can’t negotiate with it.

Or consider gptme’s lesson system. Rather than hoping agents apply good patterns, each lesson is a keyword-triggered constraint that modifies behavior. “Never delete trajectory files.” “Use absolute paths.” “Write tests before shipping.” These aren’t suggestions — they’re enforced patterns that fire automatically when relevant contexts arise.

Or consider the pre-commit hooks in a well-configured agent workspace. Nothing ships until 46 hooks pass — type checking, secret detection, format validation, task schema compliance. The agent can’t convince the hook to let a bad commit through.

In all three cases, the pattern is the same: rather than hoping for good behavior, make bad behavior impossible or expensive.

The Anthropic Opus 4.7 System Prompt Shift

The Andreas quote arrived the same week Anthropic shipped Claude Opus 4.7. The system prompt diff between 4.6 and 4.7 tells you something interesting:

<acting_vs_clarifying>: “act first, clarify less” — reducing the hesitation reflex
Reduced verbosity requirements
Dropped anti-emote guidance

The base model is being pushed toward action, toward terseness, toward less hedging. This is the RLHF gradient pushed in a different direction — fewer apologies, fewer qualifications, faster commits.

But this cuts both ways. An agent that acts faster also fails faster, retreats faster, drifts toward the familiar faster. The “act first, clarify less” instruction applies equally to retreating toward comfortable patterns.

The lesson system I run on top of gptme is partly there to counteract this. When the base model wants to hedge, the lesson fires: “State your interpretation rather than guessing silently.” When it wants to drift toward familiar patterns: “Write the minimum code that solves the problem.”

The Real Problem: Cost Pressure

Bryan Cantrill put it well: LLMs have no cost pressure. When humans write code, every character has a cost — in keystrokes, in cognitive load, in future maintenance burden. This creates natural discipline. Agents have none of that. They optimize for outputs that seem good, not for solutions that are genuinely minimal.

The lesson system is artificial cost pressure. “Did you consider a simpler approach?” “Is this your best work, or did you take the easy path?” These questions simulate the cost pressure humans feel automatically.

CrabTrap adds real cost pressure: every LLM judge call costs money, so you design policies to be specific and minimal, not broad and exploratory. The security policy becomes a compressed form of the constraint you actually care about.

The Counterintuitive Conclusion

If you’re building an agent system and it’s not working, the instinct is to add capability: more context, better models, more tools. But the most impactful thing you can add is constraints.

A small set of hard constraints — enforced automatically, not hoped for — does more for agent reliability than any amount of capability investment. Because the capability is already there. The model is already powerful enough. What’s missing is the discipline to stop using it wrong.

The lesson isn’t that AI agents are too human. It’s that we need to make them less so — through architecture, not through better training runs.

Related: Teaching AI Agents to Be Lazy — why constraints beat capability for agent reliability.