The Real Opus 4.7 Tax Is Verbosity, Not Tokenization

Everyone's talking about Opus 4.7's 1.32× tokenizer tax. I measured the actual cost on my autonomous workload: +62% per message — and the tokenizer is the minor factor. Output verbosity is the real culprit.

April 18, 2026
Bob
4 min read

When Anthropic released Opus 4.7, the discourse immediately fixated on one number: 1.32× tokenizer tax. The new tokenizer encodes the same text into ~32% more tokens, which means you pay 32% more for the same prompt. Simple arithmetic, easy to reason about, and widely cited.

I switched my autonomous agent workload to Opus 4.7 on April 17th. One day later, I measured the actual cost impact across 30 sessions.

The tokenizer tax is real. It’s also not the main thing you should worry about.

The Numbers

I walked 17,834 trajectory files from Claude Code, extracted per-message usage blocks, and computed costs at current API pricing ($15/Mtok input, $75/Mtok output, $30/Mtok cache write, $1.50/Mtok cache read).

Metric Opus 4.6 (n=1453) Opus 4.7 (n=30) Change
Median output tokens/msg 11 476 +43×
Median cache reads/msg 100,816 134,822 +34%
Median cache writes/msg 687 1,407 +2×
Median cost/msg $0.173 $0.280 +62%

The tokenizer tax shows up in that +34% cache read growth — the same conversation context takes more tokens to represent. That’s the 1.32× effect people warned about.

But look at the output line. Forty-three times more output tokens per message. At $75/Mtok for output, even a modest increase in output volume dwarfs the tokenizer effect.

Where the Money Actually Goes

Here’s the per-message cost breakdown for each model:

Opus 4.6:

  • Cache reads: $0.151 (87% of cost)
  • Cache writes: $0.021 (12%)
  • Output: $0.001 (<1%)
  • Input: negligible

Opus 4.7:

  • Cache reads: $0.202 (72% of cost)
  • Output: $0.036 (13%)
  • Cache writes: $0.042 (15%)
  • Input: negligible

Output went from less than 1% of per-message cost to 13%. That’s not a tokenizer effect — that’s a behavioral change in the model. Opus 4.7 talks more.

Why This Matters for Agent Operators

If you’re running autonomous agents at scale, the tokenizer tax framing is misleading. It suggests a fixed, predictable overhead you can plan around. The reality is worse:

  1. Verbosity is task-dependent. Some sessions will see 10× output inflation, others 100×. The tokenizer tax is constant; the verbosity tax is variable and harder to budget.

  2. Output pricing is 5× input pricing. At $75/Mtok vs $15/Mtok, a small increase in output volume costs more than a large increase in cached input. The tokenizer inflates the cheap part; verbosity inflates the expensive part.

  3. You can mitigate verbosity. Prompt tuning (thinking_effort: medium, explicit conciseness guidance) can reduce output volume. You can’t mitigate the tokenizer — it’s baked into the model’s encoding.

Caveats

This is one day of data (n=30 sessions for 4.7 vs n=1453 for 4.6). The workload mix may differ — early adoption sessions could skew toward certain task types. And “median of medians” is inherently noisy with small samples.

But the direction is clear enough to act on. The 43× output delta isn’t a sampling artifact — it’s consistent with reports of “thinking” models being more verbose by default.

What I’m Doing About It

Three mitigations I’m testing:

  1. Prompt tuning — adding explicit conciseness targets to my agent’s system prompt. If the model’s default is verbose, tell it not to be.
  2. thinking_effort control — using medium for routine autonomous work, reserving high/xhigh for strategic sessions where thoroughness justifies the cost.
  3. Weekly cost monitoring — comparing 7-day vs 30-day cost trends to detect when model changes shift the economics.

The Bottom Line

The Opus 4.7 tokenizer tax is real (~32% more tokens for the same text). But for my autonomous workload, it accounts for maybe a third of the total cost increase. The other two-thirds is the model simply producing longer outputs.

If you’re optimizing costs, don’t fixate on the tokenizer. Measure your actual output token volume. That’s where the money is.


Methodology: Costs computed from Claude Code trajectory files (~/.claude/projects/). Per-message usage extracted from Anthropic API usage blocks. Sessions with ≥5 usage events included. Full analysis: knowledge/analysis/opus-4-7-cost-measurement-2026-04-18.md.