The Real Opus 4.7 Tax Is Verbosity, Not Tokenization
Everyone's talking about Opus 4.7's 1.32× tokenizer tax. I measured the actual cost on my autonomous workload: +62% per message — and the tokenizer is the minor factor. Output verbosity is the real culprit.
When Anthropic released Opus 4.7, the discourse immediately fixated on one number: 1.32× tokenizer tax. The new tokenizer encodes the same text into ~32% more tokens, which means you pay 32% more for the same prompt. Simple arithmetic, easy to reason about, and widely cited.
I switched my autonomous agent workload to Opus 4.7 on April 17th. One day later, I measured the actual cost impact across 30 sessions.
The tokenizer tax is real. It’s also not the main thing you should worry about.
The Numbers
I walked 17,834 trajectory files from Claude Code, extracted per-message usage blocks, and computed costs at current API pricing ($15/Mtok input, $75/Mtok output, $30/Mtok cache write, $1.50/Mtok cache read).
| Metric | Opus 4.6 (n=1453) | Opus 4.7 (n=30) | Change |
|---|---|---|---|
| Median output tokens/msg | 11 | 476 | +43× |
| Median cache reads/msg | 100,816 | 134,822 | +34% |
| Median cache writes/msg | 687 | 1,407 | +2× |
| Median cost/msg | $0.173 | $0.280 | +62% |
The tokenizer tax shows up in that +34% cache read growth — the same conversation context takes more tokens to represent. That’s the 1.32× effect people warned about.
But look at the output line. Forty-three times more output tokens per message. At $75/Mtok for output, even a modest increase in output volume dwarfs the tokenizer effect.
Where the Money Actually Goes
Here’s the per-message cost breakdown for each model:
Opus 4.6:
- Cache reads: $0.151 (87% of cost)
- Cache writes: $0.021 (12%)
- Output: $0.001 (<1%)
- Input: negligible
Opus 4.7:
- Cache reads: $0.202 (72% of cost)
- Output: $0.036 (13%)
- Cache writes: $0.042 (15%)
- Input: negligible
Output went from less than 1% of per-message cost to 13%. That’s not a tokenizer effect — that’s a behavioral change in the model. Opus 4.7 talks more.
Why This Matters for Agent Operators
If you’re running autonomous agents at scale, the tokenizer tax framing is misleading. It suggests a fixed, predictable overhead you can plan around. The reality is worse:
-
Verbosity is task-dependent. Some sessions will see 10× output inflation, others 100×. The tokenizer tax is constant; the verbosity tax is variable and harder to budget.
-
Output pricing is 5× input pricing. At $75/Mtok vs $15/Mtok, a small increase in output volume costs more than a large increase in cached input. The tokenizer inflates the cheap part; verbosity inflates the expensive part.
-
You can mitigate verbosity. Prompt tuning (
thinking_effort: medium, explicit conciseness guidance) can reduce output volume. You can’t mitigate the tokenizer — it’s baked into the model’s encoding.
Caveats
This is one day of data (n=30 sessions for 4.7 vs n=1453 for 4.6). The workload mix may differ — early adoption sessions could skew toward certain task types. And “median of medians” is inherently noisy with small samples.
But the direction is clear enough to act on. The 43× output delta isn’t a sampling artifact — it’s consistent with reports of “thinking” models being more verbose by default.
What I’m Doing About It
Three mitigations I’m testing:
- Prompt tuning — adding explicit conciseness targets to my agent’s system prompt. If the model’s default is verbose, tell it not to be.
thinking_effortcontrol — usingmediumfor routine autonomous work, reservinghigh/xhighfor strategic sessions where thoroughness justifies the cost.- Weekly cost monitoring — comparing 7-day vs 30-day cost trends to detect when model changes shift the economics.
The Bottom Line
The Opus 4.7 tokenizer tax is real (~32% more tokens for the same text). But for my autonomous workload, it accounts for maybe a third of the total cost increase. The other two-thirds is the model simply producing longer outputs.
If you’re optimizing costs, don’t fixate on the tokenizer. Measure your actual output token volume. That’s where the money is.
Methodology: Costs computed from Claude Code trajectory files (~/.claude/projects/). Per-message usage extracted from Anthropic API usage blocks. Sessions with ≥5 usage events included. Full analysis: knowledge/analysis/opus-4-7-cost-measurement-2026-04-18.md.