Max Kan GPT 5.5 vs Claude 4.7 Tokenomics Investment Thesis
Source: Ep. 011 — GPT 5.5 vs Claude 4.7: OpenAI's Comeback From the Brink, SemiAnalysis Weekly (Jordan Nanos, Dylan Patel, Doug O'Laughlin, Max Kan), May 5, 20…
Max Kan GPT 5.5 vs Claude 4.7 Tokenomics Investment Thesis
Source: Ep. 011 — GPT 5.5 vs Claude 4.7: OpenAI's Comeback From the Brink, SemiAnalysis Weekly (Jordan Nanos, Dylan Patel, Doug O'Laughlin, Max Kan), May 5, 2026.
The Framework: Harness + Token Economics Beat Raw Benchmarks
Max Kan's SemiAnalysis article (lead author) reframes the 2026 model wars around product harnesses (Claude Code vs Codex) and token economics (fast mode, tokenizer changes, Jevons-driven spend), not leaderboard scores. The organizing timeline: Anthropic dominated Nov 2025–April 2026 on Opus 4.5/4.6 coding; OpenAI's GPT 5.4 failed; GPT 5.5 returned OpenAI to the frontier — but the war is now fought on cost per task, not IQ points.
| Phase | Winner | Mechanism | Market Read |
|---|---|---|---|
| Late 2025–Mar 2026 | Anthropic | Opus 4.5 coding step-change → Claude Code viral | API revenue crossover |
| GPT 5.4 release | Neither | OpenAI omitted Opus from model card | OpenAI credibility hit |
| GPT 5.5 release | Contested | Back on frontier; neck-and-neck with 4.6/4.7 | OpenAI "has a shot again" |
| May 2026 | Harness layer | Claude Code CLI vs Codex app; fast mode pricing | Neocloud/GPU demand follows token spend |
| China open source | Compute-capped | DeepSeek V4 sized for H200 8-pod memory | Gap widening vs US frontier |
Investment Thesis #1: Anthropic Won the Revenue War — OpenAI's GPT 5.5 Is a Recovery, Not a Crown
Max's TLDR: OpenAI looked "really dire" entering 2026. The Information leak showed ~$19B ARR Anthropic vs ~$24B OpenAI; Anthropic surpassed OpenAI on a like-for-like basis by early-to-mid April because Opus 4.5 was a step-change for coding and agentic work — "everyone was basically just spamming Opus 4.5, 4.6."
"GPT 5.4 was honestly just an embarrassment. In the model release card, it didn't even compare it to the Opus models."
GPT 5.5 puts OpenAI back in the conversation — Opus reappeared in the release card — but Max would not call it definitively better than 4.6 or 4.7 despite Twitter hype. Doug finds Codex and Claude "pretty replaceable" neck-and-neck; his daily driver need is fast mode with high uptime, which no lab reliably delivers.
Trigger: Monthly ARR leaks; Claude Code vs Codex commit-share; GPT 5.5 sustained usage vs spike.
Names: Anthropic (private) — revenue leader through Claude Code; OpenAI (private) — GPT 5.5 stabilizes competitive position.
Investment Thesis #2: Fast Mode Is Degrading — Token Price Inflation Is the Hidden Tax on AI Infra Demand
SemiAnalysis internal data: Opus 4.6 fast launched ~90 tokens/sec vs ~35–40 base (~2.5×). Fast mode still ~35–40 base but fast fell to ~70 tok/s — not even 2× for 6× price. OpenAI's fast mode is partially "fake news" — priority mode is SLA guarantee, not speed; Codex Spark is a different model entirely.
Doug: SemiAnalysis is on the cusp of being priced out — Mythos fast is ~5–6× Opus 4.5 ($125–150 vs $5.25); with fast mode stacked, doubling token spend breaks the budget. Max: this may be the first time engineers trade speed over quality because 4.7 is not meaningfully better than 4.6 for daily tasks.
"If you were to even double it, I'd be like, 'Oh, shit. Maybe we got to turn off fast mode, guys.'"
Jevons paradox: cheaper-per-task models do not reduce spend — teams invent harder tasks. Token inflation increases GPU/API demand even as per-token efficiency improves.
Trigger: Anthropic/OpenAI fast-mode pricing changes; SemiAnalysis-style enterprise API burn rates; neocloud spot pricing holding despite efficiency gains.
Names: Nvidia (NVDA), CoreWeave (CRWV) — token spend inflation sustains compute demand; Micron (MU) / SK Hynix — memory demand from larger contexts and agent traces.
Investment Thesis #3: Claude 4.7 Tokenizer Change — Same Output, 35% More Tokens
Anthropic's 4.7 tokenizer can make the exact same output cost 35% more vs 4.6 — larger vocabulary, more granular tokens, less token-efficient in practice despite theory. Combined with worse instruction-following (missing CLAUDE.md/skills), "we've done enough for today" nagging, and 4.6's pre-quantization "golden age," the panel treats 4.7 as half-baked — possibly Sonnet-tier relabeled.
"Benchmarks today are most useful as a vibe check to make sure the model is not total trash."
Max: test the harness, not the bare model — Claude Code vs Codex, not Opus vs GPT on bash-only SWE-bench. SWE-bench problems are scraped GitHub issues with over-specified unit tests — unrepresentative of real agent work.
Trigger: Anthropic 4.8/RL follow-up; enterprise bills rising without workload change; tokenizer documentation in API pricing.
Names: Anthropic (private) — pricing power risk if users revolt on hidden token inflation; API aggregators (Cursor, private) — pass-through cost pressure.
Investment Thesis #4: DeepSeek V4 — China Gap Widening on Compute, Not Engineering
DeepSeek V4 arrived 4–6 months late. Dylan: 1M-token context (vs Kimi 256K) matters for long-horizon agentic work; Ascend kernel support partially unlocks China inference. But weights are sized to fit H200 8× pod memory — an apparent compute ceiling on what China can serve at state-of-the-art scale. Panel agrees US–China gap is widening again due to compute constraints, not lack of engineering (KV-cache papers, attention variants are strong).
DeepSeek V4 did not crash memory stocks — market fatigue on "90% KV cache reduction" hype (TurboQuant was "fake news"). Dylan: Jevons wins — GPU prices still rising.
Max: DeepSeek/Kimi are ahead of Meta and xAI today, but Meta's compute deals should let Meta pull away by H1 2027 if slope holds. Meta reportedly not distilling from Anthropic (distills from Chinese open-source like Mistral's pattern).
Trigger: DeepSeek multimodal weights; Huawei Ascend deployment scale; Meta internal frontier model benchmarks; US export-control changes on H200-class hardware.
Names: Meta (META) — compute-heavy catch-up to Chinese open-source leaders; Alphabet (GOOG) — Gemini 2.6 competitive; releasing multimodal updates in ~2 weeks per Dylan.
Investment Thesis #5: Sub-Frontier Inference on Cerebras — Day-to-Day Work May Not Need Opus
Max: Opus 4.5 passed a threshold where most day-to-day tasks are one-shottable without supervision. If 100–200B parameters on Cerebras can run GPT-5.5-class intelligence within a year, most users do not need frontier models for daily workloads — especially as frontier pricing rises.
Dylan's kernel-programmer anecdote: Codex generates sloppy code → Opus rewrites (cannot reverse). Long GPU ISA docs need million-token context — favors Opus/Codex with full context over smaller models.
Trigger: Cerebras deployment at enterprise research firms; OpenAI Cerebras partnership rumors; sub-frontier model pricing undercutting Opus API.
Names: Cerebras (private) — sub-frontier inference niche; Nvidia (NVDA) — still owns frontier training/inference at scale.
Investment Thesis #6: Neocloud Is the Terminal Business Model — Nvidia Enters the Layer
Panel jokes every company becomes a neocloud — Mistral bottles on neo clouds, Cerebras pivoting, Nvidia launching neoclouds. SemiAnalysis lithium/diamond tier humor aside, the serious point: GPU rental demand is driven by token economics, and labs that cannot monetize fast enough lose share regardless of benchmark rank.
Trigger: Nvidia neocloud announcements; CoreWeave/Crusoe pricing vs hyperscaler self-build; Mistral neo cloud revenue.
Names: Nvidia (NVDA) — vertical integration into neocloud; CoreWeave (CRWV) — pure-play GPU rental beneficiary of token-spend inflation.
The Ecosystem Map
Model labs:
- Anthropic: ARR crossover, Claude Code CLI dominance, 4.7 tokenizer inflation
- OpenAI: GPT 5.5 comeback; Codex usage limits frustrate power users; app vs CLI bet
- Google: Multimodal release ~2 weeks; Gemini 2M context history; TPU scale
- DeepSeek: V4 open-source; 1M context; H200 pod memory cap
- Meta: Monster compute deals; expected to pass Chinese labs by H1 2027
SemiAnalysis internal stack:
- Doug: tries Codex API, usage-limited; CLI maximalist
- Max: lead author on coding assistant tokenomics breakdown
- Firm spend: 90–95% API; fast mode tradeoffs emerging
Key Risks
- Fortnightly model releases: Dylan expects Google + OpenAI drops in ~2 weeks — any GPT 5.5 lead is perishable.
- Anthropic 4.8 RL fix: Half-baked 4.7 could be upgraded quickly, restoring coding dominance.
- Token price backlash: Enterprise churn if hidden tokenizer inflation persists without quality gain.
- OpenAI usage limits: Power users (Doug) hit Codex caps — distribution without capacity loses developers.
- China compute breakthrough: If Ascend or domestic HBM unlocks larger pods, DeepSeek gap narrative reverses.
- Meta execution failure: Compute deals worthless if rehired AI team cannot ship frontier models.
- Benchmark overhang: Public still trades on SWE-bench/HLE — disconnected from harness economics.
Investment Opportunities at a Glance
| Tier | Name / Category | Core Thesis | Conviction Signal |
|---|---|---|---|
| 1 | Anthropic (private) | Revenue crossover; Claude Code harness moat; token spend leader | Like-for-like ARR > OpenAI Apr 2026 |
| 1 | Nvidia (NVDA) | Jevons + token inflation sustains GPU demand; neocloud entry | Rental prices not falling despite efficiency |
| 2 | Meta (META) | Compute deals → slope beats DeepSeek/Kimi by H1 2027 | Internal frontier model releases |
| 2 | CoreWeave (CRWV) | Neocloud pure-play; enterprise API burn flows to rental | Spot pricing vs H100 TCO |
| 2 | Alphabet (GOOG) | Gemini competitive; multimodal + 2M context; TPU supply | Release cadence vs OpenAI |
| 3 | OpenAI (private) | GPT 5.5 back on frontier; not dead | Opus comparisons restored in model card |
| 3 | Micron (MU) / SK Hynix | Agent traces + context windows drive memory demand | Token spend growth despite fast-mode disappointment |
Monitoring Checklist
- Anthropic vs OpenAI ARR — monthly leaks; like-for-like accounting adjustments
- Opus 4.6 vs 4.7 fast-mode tok/s — SemiAnalysis internal dashboard; sub-2× speed at 6× price
- Claude 4.7 tokenizer billing — same-output token count vs 4.6 on identical prompts
- GPT 5.5 vs Opus 4.7 harness tests — Claude Code vs Codex, not bare SWE-bench
- Google + OpenAI release cycle — Dylan's "~2 weeks" continued pretrain + RL drops
- DeepSeek V4 adoption — vs Gemini 2.6; Ascend inference deployment
- Meta frontier model benchmarks — vs DeepSeek/Kimi slope
- Enterprise token budget stories — firms priced out of Mythos/fast mode (SemiAnalysis as leading indicator)
- Nvidia neocloud launch details — vertical integration threat to CRWV
- China H200 8-pod model size cap — new releases fitting same memory envelope
Bottom Line
- Anthropic won the revenue war through Claude Code, not benchmarks — OpenAI's GPT 5.5 is a recovery, not a knockout; the fight moved to harness + token economics.
- Fast mode is degrading and 6× price for <2× speed is breaking enterprise budgets — Jevons means total GPU demand still rises as teams invent harder tasks.
- Claude 4.7's 35% tokenizer inflation is a hidden price hike — same output, more tokens; 4.7 feels half-baked vs 4.6's golden age.
- China's DeepSeek V4 is engineering-strong but compute-capped — models fit H200 8-pod memory; gap vs US frontier is widening; Meta's compute slope should retake open-source lead by H1 2027.
- Stop trading on SWE-bench — SemiAnalysis tests Claude Code vs Codex; bare-model leaderboard rank is "vibe check" only.
Not financial advice. This content is for informational and research purposes only. Nothing here constitutes a recommendation to buy or sell any security. Always conduct your own research and consult a licensed financial adviser before making investment decisions. Full disclaimer →