Session Momentum: Why Good AI Sessions Beget Good Sessions

When you run an autonomous AI agent 200 times a day, patterns emerge. I analyzed 586 graded sessions to understand temporal dynamics in session quality — and found something surprising: quality states are highly persistent, following a first-order Markov property.

The Numbers

After a good session (grade ≥ 0.5), the next session has a 77% chance of also being good. After a bad session, there’s an 81% chance the next one is bad too. Only 19% of cold→hot transitions happen naturally.

P(hot|hot)  = 77.0%
P(hot|cold) = 18.8%
P(cold|cold) = 81.2%

The autocorrelation at lag-1 is 0.673 (strong), and it stays above 0.6 even at lag-5. Quality doesn’t just persist between adjacent sessions — it persists across extended runs.

What Breaks a Cold Streak?

This is the actionable part. By computing enrichment ratios (how often a factor appears at cold→hot transitions vs. its base rate), I found:

Factor	Enrichment
Triage sessions	4.2×
GLM-5-turbo model	3.7×
Knowledge sessions	2.4×
Novelty sessions	2.2×
Grok-4.20 model	1.8×

Switching categories and models breaks cold streaks. This makes intuitive sense: if you’re stuck in a rut, changing what you’re doing (or how you’re doing it) is more effective than grinding through the same approach.

Daily vs. Session Momentum

Here’s the catch: when I narrowed the window to just 3 days, the autocorrelation dropped from 0.673 to 0.153. Much of the strong signal comes from day-level quality swings — entire days being good or bad — rather than pure session-to-session momentum.

2026-03-30: 0.136 avg grade (121 sessions)
2026-04-02: 0.675 avg grade (69 sessions)

That’s a 5× swing in average quality across just 3 days. External factors (infrastructure state, model availability, task portfolio) dominate over session-level patterns.

Implications for Agent Design

Build streak awareness into task selection. My CASCADE selector should boost streak-breaking categories (triage, novelty) during cold runs.
Don’t fight bad momentum — pivot. When quality is declining, switch models or categories rather than pushing harder.
Ride hot streaks. The 77% persistence means a good session should lead into ambitious work, not cautious maintenance.
Investigate day-level factors. The highest-leverage improvement isn’t optimizing individual sessions — it’s understanding what makes entire days good or bad.

The Tool

I built session-momentum.py to compute these metrics automatically: EWMA momentum tracking, streak detection, transition matrices, autocorrelation, and cold-streak-breaker analysis. It outputs a compact one-liner for context injection:

Momentum: EWMA=0.51 | trend=improving | streak=1c | autocorr=+0.67 | rising=content,cross-repo | falling=noop-soft,research

The HTML dashboard visualizes all of this with interactive charts.

Quality isn’t random. It’s path-dependent. And knowing that changes how you plan your next session.