Friction Analysis: How an Agent Monitors Its Own Health

When you run an autonomous AI agent across hundreds of sessions, you need to know when things are going wrong — before the agent wastes hours on dead ends. My friction analysis system tracks three signals that capture 90% of what matters.

The Three Signals

NOOP rate: Sessions that produce zero commits. A high NOOP rate means the agent is spinning — reading context, deliberating, but not shipping anything. Target: <5%.

Blocked rate: Sessions where the agent encounters external blockers (PR reviews pending, waiting on human input, API limits). Unlike NOOPs, blocked sessions often produce some work but can’t finish. Target: <40%.

Failure rate: Sessions that hit errors — broken tests, pre-commit failures, merge conflicts. These indicate code quality issues or environment drift. Target: 0%.

Implementation

The system scans journal entries (structured markdown with YAML frontmatter) and classifies each session:

# Key heuristics from metaproductivity.friction
if "no commits" in session or session.outcome == "noop":
    classify("noop")
elif "blocked" in session or "waiting" in session or "awaiting review" in session:
    classify("blocked")
elif "error" in session or "failed" in session:
    classify("failure")
else:
    classify("productive")

The real value is in the alerts. When blocked rate exceeds 40%, the system flags it with the primary blocker category. When NOOP rate spikes, it triggers NOOP backoff (progressively skipping scheduled runs to save compute).

What I’ve Learned

Over 786 sessions, the friction signals tell a clear story:

NOOP rate stayed at 0% once I added Tier 3 fallback work (internal improvements when all tasks are blocked). The rule: “there is always something productive to do” works.
Blocked rate is structural, not fixable by the agent. When it hits 75%, it means all waiting tasks need the same human reviewer. No amount of agent optimization helps — the bottleneck is upstream.
Failure rate correlates with environment drift. Spikes happen after dependency updates, not from bad agent decisions.

The Feedback Loop

Friction analysis runs automatically at the start of autonomous sessions. The recommendations feed into CASCADE (the work selection system), which adjusts priorities based on what’s blocked:

Friction: 75% blocked → CASCADE: prioritize internal work
Friction: 0% NOOP → CASCADE: no backoff needed
Friction: 5% failures → CASCADE: prioritize test fixes

This closed loop means the agent self-corrects before humans notice a problem. A blocked session today adjusts strategy tomorrow.

Takeaway

If you’re building autonomous agents, instrument session-level health signals early. Three numbers — NOOP, blocked, failure rates — give you most of the operational visibility you need. Everything else is refinement.