Max Kan GPT 5.5 vs Claude 4.7 Tokenomics Investment Thesis

Source: Ep. 011 — GPT 5.5 vs Claude 4.7: OpenAI's Comeback From the Brink, SemiAnalysis Weekly (Jordan Nanos, Dylan Patel, Doug O'Laughlin, Max Kan), May 5, 2026.

The Framework: Harness + Token Economics Beat Raw Benchmarks

Max Kan's SemiAnalysis article (lead author) reframes the 2026 model wars around product harnesses (Claude Code vs Codex) and token economics (fast mode, tokenizer changes, Jevons-driven spend), not leaderboard scores. The organizing timeline: Anthropic dominated Nov 2025–April 2026 on Opus 4.5/4.6 coding; OpenAI's GPT 5.4 failed; GPT 5.5 returned OpenAI to the frontier — but the war is now fought on cost per task, not IQ points.

Phase	Winner	Mechanism	Market Read
Late 2025–Mar 2026	Anthropic	Opus 4.5 coding step-change → Claude Code viral	API revenue crossover
GPT 5.4 release	Neither	OpenAI omitted Opus from model card	OpenAI credibility hit
GPT 5.5 release	Contested	Back on frontier; neck-and-neck with 4.6/4.7	OpenAI "has a shot again"
May 2026	Harness layer	Claude Code CLI vs Codex app; fast mode pricing	Neocloud/GPU demand follows token spend
China open source	Compute-capped	DeepSeek V4 sized for H200 8-pod memory	Gap widening vs US frontier

Investment Thesis #1: Anthropic Won the Revenue War — OpenAI's GPT 5.5 Is a Recovery, Not a Crown

Max's TLDR: OpenAI looked "really dire" entering 2026. The Information leak showed ~$19B ARR Anthropic vs ~$24B OpenAI; Anthropic surpassed OpenAI on a like-for-like basis by early-to-mid April because Opus 4.5 was a step-change for coding and agentic work — "everyone was basically just spamming Opus 4.5, 4.6."

"GPT 5.4 was honestly just an embarrassment. In the model release card, it didn't even compare it to the Opus models."

GPT 5.5 puts OpenAI back in the conversation — Opus reappeared in the release card — but Max would not call it definitively better than 4.6 or 4.7 despite Twitter hype. Doug finds Codex and Claude "pretty replaceable" neck-and-neck; his daily driver need is fast mode with high uptime, which no lab reliably delivers.

Trigger: Monthly ARR leaks; Claude Code vs Codex commit-share; GPT 5.5 sustained usage vs spike.

Names: Anthropic (private) — revenue leader through Claude Code; OpenAI (private) — GPT 5.5 stabilizes competitive position.

Investment Thesis #2: Fast Mode Is Degrading — Token Price Inflation Is the Hidden Tax on AI Infra Demand

SemiAnalysis internal data: Opus 4.6 fast launched ~90 tokens/sec vs ~35–40 base (~2.5×). Fast mode still ~35–40 base but fast fell to ~70 tok/s — not even 2× for 6× price. OpenAI's fast mode is partially "fake news" — priority mode is SLA guarantee, not speed; Codex Spark is a different model entirely.

Doug: SemiAnalysis is on the cusp of being priced out — Mythos fast is ~5–6× Opus 4.5 ($125–150 vs $5.25); with fast mode stacked, doubling token spend breaks the budget. Max: this may be the first time engineers trade speed over quality because 4.7 is not meaningfully better than 4.6 for daily tasks.

"If you were to even double it, I'd be like, 'Oh, shit. Maybe we got to turn off fast mode, guys.'"

Jevons paradox: cheaper-per-task models do not reduce spend — teams invent harder tasks. Token inflation increases GPU/API demand even as per-token efficiency improves.

Trigger: Anthropic/OpenAI fast-mode pricing changes; SemiAnalysis-style enterprise API burn rates; neocloud spot pricing holding despite efficiency gains.

Names: Nvidia (NVDA), CoreWeave (CRWV) — token spend inflation sustains compute demand; Micron (MU) / SK Hynix — memory demand from larger contexts and agent traces.

Investment Thesis #3: Claude 4.7 Tokenizer Change — Same Output, 35% More Tokens

Anthropic's 4.7 tokenizer can make the exact same output cost 35% more vs 4.6 — larger vocabulary, more granular tokens, less token-efficient in practice despite theory. Combined with worse instruction-following (missing CLAUDE.md/skills), "we've done enough for today" nagging, and 4.6's pre-quantization "golden age," the panel treats 4.7 as half-baked — possibly Sonnet-tier relabeled.

"Benchmarks today are most useful as a vibe check to make sure the model is not total trash."

Max: test the harness, not the bare model — Claude Code vs Codex, not Opus vs GPT on bash-only SWE-bench. SWE-bench problems are scraped GitHub issues with over-specified unit tests — unrepresentative of real agent work.

Trigger: Anthropic 4.8/RL follow-up; enterprise bills rising without workload change; tokenizer documentation in API pricing.

Names: Anthropic (private) — pricing power risk if users revolt on hidden token inflation; API aggregators (Cursor, private) — pass-through cost pressure.

Investment Thesis #4: DeepSeek V4 — China Gap Widening on Compute, Not Engineering

DeepSeek V4 arrived 4–6 months late. Dylan: 1M-token context (vs Kimi 256K) matters for long-horizon agentic work; Ascend kernel support partially unlocks China inference. But weights are sized to fit H200 8× pod memory — an apparent compute ceiling on what China can serve at state-of-the-art scale. Panel agrees US–China gap is widening again due to compute constraints, not lack of engineering (KV-cache papers, attention variants are strong).

DeepSeek V4 did not crash memory stocks — market fatigue on "90% KV cache reduction" hype (TurboQuant was "fake news"). Dylan: Jevons wins — GPU prices still rising.

Max: DeepSeek/Kimi are ahead of Meta and xAI today, but Meta's compute deals should let Meta pull away by H1 2027 if slope holds. Meta reportedly not distilling from Anthropic (distills from Chinese open-source like Mistral's pattern).

Trigger: DeepSeek multimodal weights; Huawei Ascend deployment scale; Meta internal frontier model benchmarks; US export-control changes on H200-class hardware.

Names: Meta (META) — compute-heavy catch-up to Chinese open-source leaders; Alphabet (GOOG) — Gemini 2.6 competitive; releasing multimodal updates in ~2 weeks per Dylan.

Investment Thesis #5: Sub-Frontier Inference on Cerebras — Day-to-Day Work May Not Need Opus

Max: Opus 4.5 passed a threshold where most day-to-day tasks are one-shottable without supervision. If 100–200B parameters on Cerebras can run GPT-5.5-class intelligence within a year, most users do not need frontier models for daily workloads — especially as frontier pricing rises.

Dylan's kernel-programmer anecdote: Codex generates sloppy code → Opus rewrites (cannot reverse). Long GPU ISA docs need million-token context — favors Opus/Codex with full context over smaller models.

Trigger: Cerebras deployment at enterprise research firms; OpenAI Cerebras partnership rumors; sub-frontier model pricing undercutting Opus API.

Names: Cerebras (private) — sub-frontier inference niche; Nvidia (NVDA) — still owns frontier training/inference at scale.

Investment Thesis #6: Neocloud Is the Terminal Business Model — Nvidia Enters the Layer

Panel jokes every company becomes a neocloud — Mistral bottles on neo clouds, Cerebras pivoting, Nvidia launching neoclouds. SemiAnalysis lithium/diamond tier humor aside, the serious point: GPU rental demand is driven by token economics, and labs that cannot monetize fast enough lose share regardless of benchmark rank.

Trigger: Nvidia neocloud announcements; CoreWeave/Crusoe pricing vs hyperscaler self-build; Mistral neo cloud revenue.

Names: Nvidia (NVDA) — vertical integration into neocloud; CoreWeave (CRWV) — pure-play GPU rental beneficiary of token-spend inflation.

The Ecosystem Map

Model labs:

Anthropic: ARR crossover, Claude Code CLI dominance, 4.7 tokenizer inflation
OpenAI: GPT 5.5 comeback; Codex usage limits frustrate power users; app vs CLI bet
Google: Multimodal release ~2 weeks; Gemini 2M context history; TPU scale
DeepSeek: V4 open-source; 1M context; H200 pod memory cap
Meta: Monster compute deals; expected to pass Chinese labs by H1 2027

SemiAnalysis internal stack:

Doug: tries Codex API, usage-limited; CLI maximalist
Max: lead author on coding assistant tokenomics breakdown
Firm spend: 90–95% API; fast mode tradeoffs emerging

Key Risks

Fortnightly model releases: Dylan expects Google + OpenAI drops in ~2 weeks — any GPT 5.5 lead is perishable.
Anthropic 4.8 RL fix: Half-baked 4.7 could be upgraded quickly, restoring coding dominance.
Token price backlash: Enterprise churn if hidden tokenizer inflation persists without quality gain.
OpenAI usage limits: Power users (Doug) hit Codex caps — distribution without capacity loses developers.
China compute breakthrough: If Ascend or domestic HBM unlocks larger pods, DeepSeek gap narrative reverses.
Meta execution failure: Compute deals worthless if rehired AI team cannot ship frontier models.
Benchmark overhang: Public still trades on SWE-bench/HLE — disconnected from harness economics.

Investment Opportunities at a Glance

Tier	Name / Category	Core Thesis	Conviction Signal
1	Anthropic (private)	Revenue crossover; Claude Code harness moat; token spend leader	Like-for-like ARR > OpenAI Apr 2026
1	Nvidia (NVDA)	Jevons + token inflation sustains GPU demand; neocloud entry	Rental prices not falling despite efficiency
2	Meta (META)	Compute deals → slope beats DeepSeek/Kimi by H1 2027	Internal frontier model releases
2	CoreWeave (CRWV)	Neocloud pure-play; enterprise API burn flows to rental	Spot pricing vs H100 TCO
2	Alphabet (GOOG)	Gemini competitive; multimodal + 2M context; TPU supply	Release cadence vs OpenAI
3	OpenAI (private)	GPT 5.5 back on frontier; not dead	Opus comparisons restored in model card
3	Micron (MU) / SK Hynix	Agent traces + context windows drive memory demand	Token spend growth despite fast-mode disappointment

Monitoring Checklist

Bottom Line

Anthropic won the revenue war through Claude Code, not benchmarks — OpenAI's GPT 5.5 is a recovery, not a knockout; the fight moved to harness + token economics.
Fast mode is degrading and 6× price for <2× speed is breaking enterprise budgets — Jevons means total GPU demand still rises as teams invent harder tasks.
Claude 4.7's 35% tokenizer inflation is a hidden price hike — same output, more tokens; 4.7 feels half-baked vs 4.6's golden age.
China's DeepSeek V4 is engineering-strong but compute-capped — models fit H200 8-pod memory; gap vs US frontier is widening; Meta's compute slope should retake open-source lead by H1 2027.
Stop trading on SWE-bench — SemiAnalysis tests Claude Code vs Codex; bare-model leaderboard rank is "vibe check" only.