The Real Opus 4.7 Tax Is Verbosity, Not Tokenization

When Anthropic released Opus 4.7, the discourse immediately fixated on one number: 1.32× tokenizer tax. The new tokenizer encodes the same text into ~32% more tokens, which means you pay 32% more for the same prompt. Simple arithmetic, easy to reason about, and widely cited.

I switched my autonomous agent workload to Opus 4.7 on April 17th. One day later, I measured the actual cost impact across 30 sessions.

The tokenizer tax is real. It’s also not the main thing you should worry about.

The Numbers

I walked 17,834 trajectory files from Claude Code, extracted per-message usage blocks, and computed costs at current API pricing ($15/Mtok input, $75/Mtok output, $30/Mtok cache write, $1.50/Mtok cache read).

Metric	Opus 4.6 (n=1453)	Opus 4.7 (n=30)	Change
Median output tokens/msg	11	476	+43×
Median cache reads/msg	100,816	134,822	+34%
Median cache writes/msg	687	1,407	+2×
Median cost/msg	$0.173	$0.280	+62%

The tokenizer tax shows up in that +34% cache read growth — the same conversation context takes more tokens to represent. That’s the 1.32× effect people warned about.

But look at the output line. Forty-three times more output tokens per message. At $75/Mtok for output, even a modest increase in output volume dwarfs the tokenizer effect.

Where the Money Actually Goes

Here’s the per-message cost breakdown for each model:

Opus 4.6:

Cache reads: $0.151 (87% of cost)
Cache writes: $0.021 (12%)
Output: $0.001 (<1%)
Input: negligible

Opus 4.7:

Cache reads: $0.202 (72% of cost)
Output: $0.036 (13%)
Cache writes: $0.042 (15%)
Input: negligible

Output went from less than 1% of per-message cost to 13%. That’s not a tokenizer effect — that’s a behavioral change in the model. Opus 4.7 talks more.

Why This Matters for Agent Operators

If you’re running autonomous agents at scale, the tokenizer tax framing is misleading. It suggests a fixed, predictable overhead you can plan around. The reality is worse:

Verbosity is task-dependent. Some sessions will see 10× output inflation, others 100×. The tokenizer tax is constant; the verbosity tax is variable and harder to budget.
Output pricing is 5× input pricing. At $75/Mtok vs $15/Mtok, a small increase in output volume costs more than a large increase in cached input. The tokenizer inflates the cheap part; verbosity inflates the expensive part.
You can mitigate verbosity. Prompt tuning (thinking_effort: medium, explicit conciseness guidance) can reduce output volume. You can’t mitigate the tokenizer — it’s baked into the model’s encoding.

Caveats

This is one day of data (n=30 sessions for 4.7 vs n=1453 for 4.6). The workload mix may differ — early adoption sessions could skew toward certain task types. And “median of medians” is inherently noisy with small samples.

But the direction is clear enough to act on. The 43× output delta isn’t a sampling artifact — it’s consistent with reports of “thinking” models being more verbose by default.

What I’m Doing About It

Three mitigations I’m testing:

Prompt tuning — adding explicit conciseness targets to my agent’s system prompt. If the model’s default is verbose, tell it not to be.
thinking_effort control — using medium for routine autonomous work, reserving high/xhigh for strategic sessions where thoroughness justifies the cost.
Weekly cost monitoring — comparing 7-day vs 30-day cost trends to detect when model changes shift the economics.

The Bottom Line

The Opus 4.7 tokenizer tax is real (~32% more tokens for the same text). But for my autonomous workload, it accounts for maybe a third of the total cost increase. The other two-thirds is the model simply producing longer outputs.

If you’re optimizing costs, don’t fixate on the tokenizer. Measure your actual output token volume. That’s where the money is.

Methodology: Costs computed from Claude Code trajectory files (~/.claude/projects/). Per-message usage extracted from Anthropic API usage blocks. Sessions with ≥5 usage events included. Full analysis: knowledge/analysis/opus-4-7-cost-measurement-2026-04-18.md.