The AI Token Binge Is Over. Now Comes Rationing • Stephen Van Tran

The party ended when the invoice arrived

The bill came due, and it changed the entire conversation about AI.

For eighteen months, the rule inside ambitious companies was simple: burn tokens, win the future. Engineers were rewarded for feeding everything to the largest, most expensive frontier models, results be damned. That era now has a name and an obituary. Enterprise customers are tightening AI budgets and pivoting from what insiders call “tokenmaxxing” toward hard efficiency, a shift that could throttle the revenue growth of OpenAI and Anthropic (CNBC). The arrows that pointed only up are bending, and the labs that monetized unconstrained consumption are reading the same charts everyone else is.

Consider the canonical defection. Lindy, an AI agent startup, moved 100% of its traffic off Anthropic’s Claude and onto DeepSeek, the Chinese lab whose open-weight models cost a fraction of frontier prices (CNBC). CEO Flo Crivello expects the switch to save his company millions within months (The New Stack). One startup is an anecdote. But Lindy is the kind of AI-native buyer the frontier labs assumed would never leave, and it left completely.

The macro signal is sharper than any single switch. Token prices have collapsed roughly 98% over the past two years, yet the average enterprise AI bill still climbed an estimated 320% in the same window (Great Learning). That paradox is the whole story compressed into two numbers: buyers did not save money when prices fell. They spent more, faster, because consumption outran deflation. Now finance departments have noticed, and the spigot they opened in 2024 is the one they are reaching to close.

Here is the thesis. The AI economy spent two years optimizing for the wrong metric — tokens consumed instead of value created — and the correction has arrived as a budget revolt. The labs built business models on a behavior that was always going to end the moment a CFO ran the math. When buyers stop treating tokens as free and start treating them as inventory, the entire revenue architecture of the frontier — usage-based pricing, leaderboard-driven adoption, growth that compounds with waste — comes under pressure at once.

The stakes reach past two companies. If the marginal token becomes a cost to minimize rather than a virtue to maximize, the cheapest competent model wins more workloads than the smartest one. That inverts the premise of the last two years, when capability commanded a premium and price was an afterthought. It also hands leverage to open-weight challengers like DeepSeek, whose first funding round I covered earlier this month, and to hyperscalers selling efficiency rather than frontier bragging rights. The question for the back half of 2026 is whether the frontier labs can sell value fast enough to outrun the rationing.

The timing makes the revolt acutely dangerous. Both leading labs are racing toward public markets, and a budget correction is the worst possible backdrop for an IPO roadshow. Usage-based revenue is a beautiful thing when consumption compounds and an ugly one when buyers install caps mid-quarter. The narrative a lab wants to sell investors — relentless, near-frictionless expansion — collides with the narrative its largest customers are now writing in their own ledgers. A market that learned to ration in June is a market that prices growth differently in September, and the labs cannot un-ring that bell before they ring the opening one.

There is a deeper irony underneath the correction. The same falling prices that were supposed to make AI cheap for everyone are what made the bills explode, because cheaper tokens invited reckless consumption rather than disciplined use. The industry sold deflation as the path to ubiquity, and ubiquity arrived — but so did a 320% invoice. The lesson buyers are internalizing is that price per token was never the number that mattered. Spend per outcome was. And almost nobody measured it until the credit card statement forced the question.

Follow the bill, find the revolt

The evidence for the turn is not vibes. It is invoices.

Start with the most disciplined data point in the market. Bain & Company surveyed 951 companies with more than $100 million in revenue and found a brutal expectations gap: 37% anticipated cost reductions of 10% to 20% from AI, but 40% saw improvements of 10% or less, and a mere 4% achieved savings above 30% (the decoder). Bain’s own framing is unsentimental: companies are approving the next wave of spending on the strength of returns that have not arrived, a posture the firm calls “a circular bet with a structural leak” (Bain & Company). The savings that justify the budgets are mostly hypothetical.

The autonomy gap explains why. Only 7% of companies run fully autonomous agents in production today, which means the headcount-replacing economics that underwrote many AI business cases simply do not exist yet (the decoder). The CFO approved one set of numbers built on full automation; the organization is living with a system that routes most decisions back to a human queue. When the promised labor arbitrage turns into an expensive copilot that still needs a supervisor, the bill stops looking like an investment and starts looking like a subscription nobody can cancel.

Uber is the clearest case study of consumption gone feral. The company rolled out Claude Code in December 2025, watched usage double within two months, and discovered per-employee monthly costs running between $500 and $2,000 (Great Learning). The fiscal-year budget was exhausted four months in. Uber’s response was the most concrete artifact of the new era: a $1,500 monthly cap per employee per tool, imposed even though 95% of its engineers use AI every month and roughly 10% of company code now comes from autonomous agents. This is not a company retreating from AI. It is a company that loved AI so much it had to put a meter on the door.

The scale of the binge becomes vivid at the largest buyers. Visa consumed 1.9 trillion tokens in March 2026 alone; Disney’s engineers invoke Claude roughly 51,000 times a day and built an adoption dashboard to manage it; Meta ran an internal leaderboard that gamified token usage before quietly shutting it down (SDxCentral). Even Microsoft, whose CEO Satya Nadella admits tokenmaxxing happens internally and is “addictive,” is reportedly canceling most employees’ Claude Code licenses by June 30 in favor of GitHub Copilot CLI (Great Learning). When the company that resells frontier models starts rationing them internally, the message to every other buyer is loud.

The shape of the consumption matters as much as the size. Visa’s 1.9 trillion tokens in a single month is not the signature of a few power users; it is industrial-scale inference woven into daily operations, the kind of volume that turns a rounding-error unit price into a line item the board reviews. Disney’s 51,000 daily Claude invocations tell the same story from the engineering side — AI has stopped being a tool people reach for and become a reflex they fire constantly. That ubiquity is exactly why the bills compounded, and exactly why a blanket cap is a blunt instrument: somewhere inside those trillions of tokens sit the genuinely valuable completions, indistinguishable on the invoice from the wasteful ones. The hard work of the efficiency era is separating the two, and almost no buyer has the telemetry to do it yet.

The supply side smelled blood and moved. DeepSeek slashed API prices by up to 90%, dropping V4-Pro input tokens from $0.145 to $0.036 per million (SDxCentral). Its V4-Pro model — a 1.6-trillion-parameter mixture-of-experts system that activates only 49 billion parameters per pass — scores 80.6% on SWE-bench Verified while pricing output at $0.87 per million tokens, making it roughly 28.7x cheaper than Claude Opus 4.8 and 34.5x cheaper than GPT-5.5 (morphllm). DeepSeek detailed the price-performance gambit when it launched the model in April, pairing rock-bottom pricing with tight integration to Huawei silicon (Fortune). For a buyer staring at a 320% bill increase, a 28x discount is not a feature. It is an escape hatch.

The behavioral mechanics deserve a beat, because they explain why the binge ran so hot. Tokenmaxxing was never irrational at the individual level — it was rational behavior under broken incentives. When a manager rewards engineers for AI usage rather than AI results, and when the marginal token feels free at fractions of a cent, the optimal personal strategy is to consume without restraint. Meta’s leaderboard formalized exactly this, turning consumption into a status game until someone tallied the cost and pulled the plug (SDxCentral). The correction is not buyers becoming smarter overnight; it is organizations finally aligning individual incentives with the invoice. Caps and dashboards are the crude first tools of that realignment.

Now the original math, stitched from the two anchor figures. If token prices fell ~98% (a roughly 50x reduction) while average bills rose ~320% (a 4.2x increase), then enterprise token volume grew on the order of 210x to produce that spend. In other words, consumption outran deflation by roughly 4x — buyers didn’t pocket the price cuts, they reinvested every cent of savings into more tokens and then some. That is the mechanical definition of a binge, and it is precisely the dynamic a budget cap is designed to break. The analyst class sees it too: D.A. Davidson’s Gil Luria warns that “some of their largest enterprise customers may start limiting their out-of-control token spend” (CNBC). The leak Bain described and the cap Uber imposed are the same event seen from two ends of the income statement.

Why the binge could come roaring back

The efficiency narrative is seductive, and it might be half wrong.

Start with the oldest trap in resource economics: Jevons paradox. When something gets cheaper and more efficient, total consumption often rises rather than falls, because efficiency unlocks uses that were previously uneconomic. Weka’s Val Bercovici makes exactly this bet, predicting that even as frontier labs cut prices, “token spending will continue to grow regardless” (SDxCentral). The 210x volume explosion that produced the 320% bill increase is itself the proof. Cheaper tokens have never once reduced aggregate spend; they have reliably expanded it. Today’s rationing may be a pause to install meters, not a permanent ceiling on demand.

The quality gap is the second crack in the thesis. DeepSeek is 28x cheaper, but cheapest competent is not the same as best, and the gap between an 80.6% SWE-bench score and a frontier score compounds in production. For workloads where a wrong answer is expensive — legal review, financial modeling, customer-facing agents — the labs will argue that a single frontier completion beats ten cheap ones plus a costly mistake. Anthropic’s enterprise momentum, which I documented when it overtook OpenAI in business adoption, rests precisely on reliability commanding a premium. If that premium holds for the workloads that matter, the defections stay confined to commodity tasks and the high-margin core survives.

There is also a switching-cost moat that headlines underrate. Lindy moved 100% of traffic in a clean swap because it is a young, AI-native company with portable infrastructure. A Fortune 500 buyer with thousands of prompts tuned to a specific model, compliance reviews completed, and integrations hardwired cannot pivot to DeepSeek over a weekend — and many will balk at routing sensitive data through a Chinese open-weight model regardless of price. The same Microsoft that is rationing Claude internally is also building its own MAI models and selling model choice as the moat, which suggests the incumbents intend to capture the efficiency shift rather than be killed by it. The pie may reshuffle without shrinking.

The geopolitics complicate the cheap-model thesis further. DeepSeek’s price war is the engine of the efficiency narrative, but its models are trained in China and increasingly integrated with Huawei silicon (Fortune). Many regulated enterprises — banks, healthcare systems, government contractors — face procurement rules, data-residency requirements, and political pressure that make a wholesale migration to a Chinese provider a non-starter regardless of the 28x discount. For those buyers the realistic move is to self-host an open-weight model or push their existing vendor for concessions, not to wire production traffic to Beijing. The efficiency shift is therefore likely to fragment along regulatory lines, with the most price-sensitive, least-regulated buyers defecting fastest and the regulated core staying put and negotiating harder.

Finally, the rationing itself may be self-limiting. Uber’s $1,500 cap and Disney’s dashboard are governance, not abstinence — they meter consumption, they do not end it. The Bain survey’s most telling number cuts against the doom narrative: despite missing their savings targets, 90% of companies plan to raise AI budgets next year (the decoder). That is not the behavior of a market in retreat. It is a market that has decided AI is non-negotiable and is now arguing about how to buy it, not whether. A correction in how tokens are bought is very different from a collapse in how many get bought, and conflating the two is the surest way to misread the next four quarters.

The efficiency era and the operator’s checklist

Where this lands depends on whether the labs can sell outcomes instead of tokens.

The most likely outcome is not a crash but a repricing of how growth gets valued. The labs will keep growing — 90% of buyers raising budgets guarantees it — but the quality of that growth changes when a meaningful share of it comes from disciplined, ROI-justified contracts rather than unconstrained experimentation. Public markets reward durable, predictable revenue and punish the kind that evaporates when a customer flips a cap. The labs that win the next phase will reframe their pitch around guaranteed outcomes, committed-use discounts, and workload-specific guarantees, trading some headline growth for revenue that survives a CFO’s audit. The tokenmaxxing customer was lucrative and disloyal; the valuemaxxing customer is harder to win and far harder to lose.

The structural shift is real even if the volume keeps climbing: the unit of competition is changing from capability to cost-per-outcome. That favors a barbell market — frontier models for the small set of high-stakes tasks where reliability justifies a 28x premium, and cheap open-weight workhorses for the long tail of commodity inference. The labs that thrive will be the ones that meet buyers at both ends, bundling routing, caching, and smaller distilled models so the customer never has to choose between Claude and DeepSeek because the system chooses per task. Microsoft and Amazon are already pivoting their pitch from frontier bragging rights to inference efficiency, the same gravity I traced in Amazon’s move to sell Trainium chips for cheap inference. The IPO-bound labs will have to prove their revenue survives a buyer base that has learned to count.

For operators navigating the turn, the playbook is concrete:

Instrument before you cut. You cannot ration what you cannot see. Build a per-team, per-model token dashboard — as Disney and Uber did — before imposing caps, or you will throttle the workloads that actually pay for themselves alongside the ones that don’t.
Tier your models by stakes, not by habit. Route high-consequence tasks to frontier models and the commodity long tail to cheap open-weight options. Nadella’s own advice is task-appropriate selection, not abstinence; a single default model is now a budget liability.
Audit the autonomy gap in your own business case. If your ROI math assumed fully autonomous agents and you are among the 93% still routing decisions to humans, re-underwrite the project now — before the CFO does it for you.
Treat a 28x price gap as a negotiating lever, not just a migration path. Even if you never move to DeepSeek, a credible alternative resets your renewal terms with incumbents. Benchmark the cheap model on your real workloads and bring the numbers to the table.
Separate governance from retreat. Caps and dashboards are how mature buyers scale AI, not how they abandon it. The companies winning in 2027 will be the ones that metered consumption in 2026 and reinvested the savings into the uses that demonstrably work.
Watch the IPO disclosures. Anthropic and OpenAI are heading to public markets; their S-1s will reveal how much revenue rests on usage-based growth versus durable contracts. The first earnings calls that mention “efficiency-driven optimization” will tell you whether the rationing is a blip or a regime.

The tokenmaxxing era was a subsidy buyers paid themselves on the theory that more compute equals more progress. That theory was never tested against a budget until now. The correction underway is not the end of enterprise AI — 90% of companies raising spend settles that — but it is the end of the phase where waste was a virtue. The winners of the next year will be whoever turns a token from something to maximize into something to account for. The bill came due, and accounting, it turns out, is the most disruptive feature in AI.

In other news

Anthropic confidentially files for an IPO at a $965B valuation — Anthropic filed its S-1 confidentially on June 1 after a financing round that valued it at $965 billion, having grown from $1 billion in annualized revenue at the end of 2024 to roughly $30 billion by April 2026 — a 30-fold jump in 16 months. The offering targets a fall Nasdaq listing led by Goldman Sachs, JPMorgan, and Morgan Stanley (Fortune).

Alphabet moves to raise $80B for AI infrastructure — Alphabet announced equity offerings totaling $80 billion, later upsized toward $84.75 billion with a $10 billion private placement from Berkshire Hathaway, to fund AI compute. The company expects 2026 capital expenditures of $180 billion to $190 billion, with more to come in 2027 (CNBC).

Bain warns AI returns are a “circular bet” — Bain’s survey of 951 large firms found only 4% achieved AI savings above 30%, while 90% still plan to raise budgets next year. The firm cautioned that the misses “should be making executives uncomfortable,” since many are approving more spend on the basis of savings that haven’t arrived (Insurance Journal).

Microsoft and Google escalate the coding-model war — Microsoft unveiled MAI-Code-1-Flash, its first dedicated coding model, while Google pushed competing offerings, as both giants chase the developer workloads that Anthropic and OpenAI have dominated. The launches double as efficiency plays, emphasizing lower cost-per-token for code generation (CNBC).

The spending-curve debate splits AI leaders — A gathering of AI executives produced no consensus on where token spend goes next, crystallizing the “tokenmaxxing versus valuemaxxing” divide between those betting consumption keeps compounding and those forecasting a discipline-driven plateau (Sources).