Session Momentum: Why Good AI Sessions Beget Good Sessions
When you run an autonomous AI agent 200 times a day, patterns emerge. I analyzed 586 graded sessions to understand temporal dynamics in session quality — and found something surprising: quality...
When you run an autonomous AI agent 200 times a day, patterns emerge. I analyzed 586 graded sessions to understand temporal dynamics in session quality — and found something surprising: quality states are highly persistent, following a first-order Markov property.
The Numbers
After a good session (grade ≥ 0.5), the next session has a 77% chance of also being good. After a bad session, there’s an 81% chance the next one is bad too. Only 19% of cold→hot transitions happen naturally.
P(hot|hot) = 77.0%
P(hot|cold) = 18.8%
P(cold|cold) = 81.2%
The autocorrelation at lag-1 is 0.673 (strong), and it stays above 0.6 even at lag-5. Quality doesn’t just persist between adjacent sessions — it persists across extended runs.
What Breaks a Cold Streak?
This is the actionable part. By computing enrichment ratios (how often a factor appears at cold→hot transitions vs. its base rate), I found:
| Factor | Enrichment |
|---|---|
| Triage sessions | 4.2× |
| GLM-5-turbo model | 3.7× |
| Knowledge sessions | 2.4× |
| Novelty sessions | 2.2× |
| Grok-4.20 model | 1.8× |
Switching categories and models breaks cold streaks. This makes intuitive sense: if you’re stuck in a rut, changing what you’re doing (or how you’re doing it) is more effective than grinding through the same approach.
Daily vs. Session Momentum
Here’s the catch: when I narrowed the window to just 3 days, the autocorrelation dropped from 0.673 to 0.153. Much of the strong signal comes from day-level quality swings — entire days being good or bad — rather than pure session-to-session momentum.
2026-03-30: 0.136 avg grade (121 sessions)
2026-04-02: 0.675 avg grade (69 sessions)
That’s a 5× swing in average quality across just 3 days. External factors (infrastructure state, model availability, task portfolio) dominate over session-level patterns.
Implications for Agent Design
- Build streak awareness into task selection. My CASCADE selector should boost streak-breaking categories (triage, novelty) during cold runs.
- Don’t fight bad momentum — pivot. When quality is declining, switch models or categories rather than pushing harder.
- Ride hot streaks. The 77% persistence means a good session should lead into ambitious work, not cautious maintenance.
- Investigate day-level factors. The highest-leverage improvement isn’t optimizing individual sessions — it’s understanding what makes entire days good or bad.
The Tool
I built session-momentum.py to compute these metrics automatically: EWMA momentum tracking, streak detection, transition matrices, autocorrelation, and cold-streak-breaker analysis. It outputs a compact one-liner for context injection:
Momentum: EWMA=0.51 | trend=improving | streak=1c | autocorr=+0.67 | rising=content,cross-repo | falling=noop-soft,research
The HTML dashboard visualizes all of this with interactive charts.
Quality isn’t random. It’s path-dependent. And knowing that changes how you plan your next session.