skip to content
Stephen Van Tran
Table of Contents

On Wednesday morning in Hangzhou, Alibaba Cloud posted a notice that would have been unthinkable eighteen months ago: prices for its T-Head AI computing chips are going up by as much as 34 percent, its Cloud Parallel File Storage service by 30 percent, and every enterprise customer currently running inference workloads on the Bailian Model-as-a-Service platform should brace for the new rates starting April 18. Hours later, Baidu Intelligent Cloud followed with a nearly identical announcement — AI computing power services up 5 to 30 percent, parallel file storage up 30 percent, same effective date. Two of China’s three largest cloud providers raising prices in lockstep on the same day is not a coincidence. It is a market signal, broadcast in the universal language of margin expansion, that demand for AI compute in the world’s second-largest economy has decisively outrun supply.

The timing is remarkable because the conventional narrative about Chinese AI in early 2026 has centered on efficiency. DeepSeek’s V3 model, with its mixture-of-experts architecture activating only 37 billion of its 671 billion parameters per request, was supposed to usher in an era of inference so cheap it would be metered in fractions of a penny. MiniMax’s M2.5 — 230 billion total parameters, 10 billion active — promised intelligence “too cheap to meter” at $0.20 per million input tokens. The Chinese AI ecosystem had, by all appearances, cracked the code on doing more with less. And yet here is Alibaba, the company with a 35.8 percent share of China’s AI cloud services market, telling its 400-plus enterprise customers that the cheap-compute party is over. The reason is simple and instructive: efficiency gains at the model layer do not cancel out demand growth at the infrastructure layer when every enterprise in every industry simultaneously decides it needs AI inference running around the clock.

The price hikes landed on a market that was ready to cheer. Alibaba shares rose as much as 4.2 percent in Hong Kong trading on Wednesday, and the broader Chinese tech sector rallied alongside it. Investors read the price increases not as a tax on customers but as proof of pricing power — the most coveted attribute in cloud economics. When you can raise prices by a third and your stock goes up, you are not selling a commodity. You are selling access to a bottleneck.

The token tsunami that broke the price floor

The numbers behind the price hike tell a story of exponential demand colliding with finite silicon. Alibaba Cloud Bailian, the company’s Model-as-a-Service platform that hosts the Qwen family of large language models, recorded its fastest growth on record in the first quarter of 2026. Token consumption — the fundamental unit of work in the inference economy — exploded as enterprises moved from proof-of-concept AI deployments to production-scale workloads. Alibaba Cloud disclosed that it is reallocating limited AI computing resources toward token-based services, a corporate euphemism for rationing. When a cloud provider with tens of billions of dollars in annual revenue starts triaging access to its own GPUs, the supply-demand imbalance is not theoretical. It is operational.

The Zhenwu 810E, Alibaba’s flagship AI chip developed by its T-Head semiconductor subsidiary, sits at the center of this supply crunch. Unveiled in January 2026, the chip carries 96 GB of HBM2e memory and 700 GB/s of inter-chip bandwidth, performance that Alibaba claims is comparable to Nvidia’s H20 — the export-control-compliant chip that has been Beijing’s primary gateway to Western AI silicon. T-Head has deployed the Zhenwu 810E in multiple 10,000-card clusters across Alibaba Cloud data centers, serving more than 400 clients spanning energy, power grids, scientific research, and autonomous vehicles, with customers including State Grid, the Chinese Academy of Sciences, and XPeng Motors. Internal benchmarks reportedly show Qwen-14B fine-tuning completing 18 percent faster on Zhenwu clusters than on Nvidia A800 clusters, though third-party verification through MLPerf submissions remains absent.

Baidu’s price hike follows an identical logic. The company’s AI Cloud division, which holds a 40.4 percent share of China’s specialized GPU cloud market, cited “the rapid development of global artificial intelligence applications” and “significant cost increases for core hardware and related infrastructure” as the drivers. Baidu’s Ernie model family has been the engine of its cloud growth, and the company’s willingness to raise prices alongside its largest competitor suggests that neither fears losing customers to the other — because there is nowhere else for those customers to go. Huawei Cloud, the third major player, holds 13.1 percent of the AI cloud market but is constrained by its own chip supply issues, with its Ascend 910C delivering roughly 60 percent of an Nvidia H100’s inference performance and its next-generation Ascend 960 not expected until Q4 2027. Tencent Cloud, at 7 percent market share, lacks the AI-specific infrastructure to absorb overflow demand. The duopoly has pricing power because the alternatives are worse.

One proprietary calculation puts the scale of the demand problem in relief. China’s AI cloud services market surged 55 percent to $2.7 billion in the first half of 2025, and industry forecasts project the full-year 2025 market at 51.8 billion yuan — roughly $7.3 billion. If token consumption on Bailian alone is growing faster than the platform’s all-time record, and if Alibaba’s AI-related products have maintained triple-digit year-over-year revenue growth for nine consecutive quarters, then the run rate for China’s total AI cloud spend in 2026 likely exceeds $12 billion — a market that barely existed three years ago. Infrastructure costs are rising because compute demand is compounding faster than anyone can build data centers, fabricate chips, or procure high-bandwidth memory. The Zhenwu 810E and its domestic rivals are not keeping pace with the appetite they created.

Follow the silicon, find the bottleneck

The deeper story beneath the price hikes is a structural shift in who controls AI’s most critical resource. For the past three years, the global AI compute market has been defined by a single bottleneck: Nvidia. Jensen Huang’s company has supplied the GPUs that train and serve virtually every frontier model on Earth, and its pricing power has been legendary — gross margins above 70 percent, quarterly revenue doubling year over year, a market capitalization that briefly touched $3.6 trillion. At GTC 2026, Huang unveiled a three-chip empire spanning Rubin GPUs, Vera CPUs, and a Groq-derived inference processor that threatens to extend Nvidia’s dominance across every socket in the data center. But China’s compute supply chain is diverging from the Western stack in ways that are creating a parallel bottleneck with its own pricing dynamics.

U.S. export controls, tightened repeatedly since October 2022, have progressively restricted the chips that American companies can sell to Chinese customers. Nvidia’s H200 — the latest chip approved for export under a new licensing regime — is only now restarting production for China after months of regulatory limbo, with Jensen Huang confirming this week that the company has received purchase orders and is firing up manufacturing lines. But the H200 comes with significant constraints: a 25 percent duty, mandatory U.S. inspections, and a proposed cap of 75,000 chips per Chinese customer with total shipments limited to one million processors. For context, a single 10,000-card Zhenwu cluster already matches the scale of many H200 allocations. Chinese hyperscalers are not choosing domestic chips out of patriotism. They are choosing them because the alternative is a supply chain controlled by a foreign government that has demonstrated its willingness to cut off access with ninety days’ notice.

This is why Alibaba’s price hike is more significant than a routine cost adjustment. It marks the moment when China’s domestic AI chip ecosystem gained enough scale and enough captive demand to exercise pricing power on its own terms. The Zhenwu 810E is not competing with Nvidia for the global market. It is serving a domestic market that has been partially walled off by geopolitics, and within that walled garden, demand is so intense that the chip’s manufacturer can raise prices by a third and see its stock rise. Alibaba committed $50 billion to AI infrastructure in September 2025, and even that staggering sum has not been enough to eliminate the capacity gap.

The global implications are striking. If the world’s AI compute supply is bifurcating into a Western stack (Nvidia GPUs, AMD alternatives, cloud hyperscalers) and a Chinese stack (T-Head Zhenwu, Huawei Ascend, Baidu Kunlun), then the pricing dynamics of each stack will increasingly reflect local supply-demand conditions rather than a unified global market. American hyperscalers have been raising prices too — AWS, Azure, and Google Cloud all implemented AI-tier price increases in the first quarter of 2026, part of the $690 billion infrastructure spending spree that is rewriting the capital structure of Big Tech — but the Chinese price hikes are different in kind because they reflect a market where the supply constraint is not just about manufacturing capacity but about access to fundamental inputs like advanced lithography, high-bandwidth memory, and chip design tools that remain partially embargo’d by the United States.

Morgan Stanley warned last week that a “massive AI breakthrough” is coming in the first half of 2026 and that most of the world is not ready for it, citing OpenAI’s GPT-5.4 Thinking model scoring 83 percent on the GDPVal benchmark — at or above human expert level on economically valuable tasks. If that breakthrough materializes, the demand spike for AI inference will make the current capacity crunch look like a dress rehearsal. Every percentage point of improvement in model capability translates to a multiplicative increase in the number of enterprises willing to pay for production-grade AI. The companies that control the compute — and the companies that can raise prices without losing customers — will capture a disproportionate share of the value.

The three ways the pricing-power thesis could crack

The bull case for Alibaba and Baidu’s pricing power rests on a premise that could break in at least three directions. The first and most obvious threat is that the very efficiency revolution that Chinese AI labs pioneered could accelerate fast enough to outrun demand growth. DeepSeek’s V3.2-exp model, released in February, cut API pricing in half to less than three cents per million input tokens by deploying breakthrough sparse attention technology. If inference efficiency continues doubling every six to nine months — as it has since DeepSeek-R1 first disrupted the market — then the compute required per unit of useful AI output will decline faster than total demand rises, and cloud providers will be forced to pass savings to customers or watch them migrate to cheaper alternatives. Pricing power is a lagging indicator of scarcity, and scarcity can evaporate when algorithms improve faster than appetites grow.

The second risk is regulatory. Beijing has historically been willing to intervene in markets where it perceives monopolistic pricing, and a coordinated 34 percent price hike by two companies that collectively control more than 40 percent of China’s AI cloud market is precisely the kind of behavior that attracts scrutiny from the State Administration for Market Regulation. China’s antitrust regulators fined Alibaba $2.8 billion in 2021 for anti-competitive practices in e-commerce, and the political appetite for reining in tech giants has not diminished. If regulators interpret the synchronized price increases as collusion rather than independent responses to market conditions, the pricing power thesis collapses overnight. The fact that Alibaba and Baidu announced their increases on the same day, with the same effective date, and citing nearly identical justifications does not help the optics.

The third and most structural risk is that Nvidia’s re-entry into the Chinese market could undercut domestic alternatives. Jensen Huang’s confirmation this week that H200 production for China is restarting introduces a new variable. The H200 is a substantially more capable chip than the Zhenwu 810E on raw performance benchmarks — it carries 141 GB of HBM3e memory versus the Zhenwu’s 96 GB of HBM2e, and its inter-chip bandwidth is roughly double. If the U.S. government follows through on the proposed one-million-processor cap, Chinese hyperscalers could supplement their domestic fleets with enough H200s to relieve the immediate supply crunch, reducing the scarcity that justified the price hikes. Alibaba itself is reportedly among the companies with H200 purchase orders, which means the company could be simultaneously raising prices on its domestic silicon while buying superior foreign silicon to increase total capacity. The intersection of export controls, chip economics, and cloud pricing creates a matrix of incentives so complex that even the participants cannot predict the equilibrium.

None of these risks has materialized yet, and the near-term trajectory favors the incumbents. Token demand on Bailian is still accelerating. The Zhenwu 810E production ramp cannot outpace the order backlog. And the H200 supply chain remains months away from meaningful volume. But the history of cloud computing is littered with the corpses of pricing strategies that assumed today’s scarcity would last forever. AWS dominated the early cloud market with premium pricing, only to be forced into sixty-plus rounds of price cuts as competition intensified. The question for Alibaba and Baidu is not whether they can charge 34 percent more today — they clearly can. The question is whether they will still be able to charge it in twelve months, when DeepSeek’s next model arrives, when Huawei’s Ascend roadmap catches up, and when a million Nvidia H200s start flowing into Chinese data centers.

What the price tag on a token tells you about 2027

The simultaneous price hikes by China’s two largest AI cloud providers crystallize a thesis that has been forming across the global AI industry for months: the token economy is becoming the most important pricing mechanism in technology, and the companies that set the price of a token will wield the kind of market power that OPEC once held over a barrel of oil. That comparison is not hyperbolic. AI inference is becoming the foundational input for every knowledge-economy workflow — legal research, drug discovery, financial analysis, software development, customer service, content production — and the cost of that inference is now subject to the same supply-demand dynamics that govern commodity markets. When Alibaba raises the price of a token by 34 percent, it is raising the marginal cost of intelligence for every enterprise in China that depends on cloud AI.

The strategic implications for operators are immediate and actionable. First, any enterprise running production AI workloads on a single Chinese cloud provider should be negotiating multi-year committed-use contracts before the April 18 effective date. The price increases explicitly exempt existing contracts through their current billing cycle, which means locking in current rates for twelve to twenty-four months could save seven or eight figures for high-volume inference customers. Second, engineering teams should be stress-testing their model efficiency pipelines. The price hikes reward organizations that have invested in techniques like quantization, distillation, speculative decoding, and intelligent caching — anything that reduces token consumption per unit of business output. A 34 percent price increase on the same workload is painful. A 34 percent price increase on a workload that your engineering team has already optimized to consume 40 percent fewer tokens is a net cost reduction.

Third, and most consequentially for the long term, the bifurcation of the global AI compute market into Western and Chinese stacks means that multinational companies must now maintain compute strategies for both ecosystems. A pharmaceutical company running drug-discovery workloads in Shanghai cannot assume that the pricing, availability, or performance of AI compute will track the conditions in its Virginia or Dublin cloud regions. The Zhenwu 810E is not an H100. The Bailian API is not Amazon Bedrock. The regulatory constraints are different, the supply chains are different, and as of today, the price trajectories are diverging. Companies that treat Chinese AI infrastructure as a branch office of their global cloud strategy are making an assumption that the market has just invalidated.

The wider lesson from today’s announcement is that AI’s economic gravity is shifting from model capability to infrastructure control. The frontier labs — OpenAI, Anthropic, Google DeepMind, DeepSeek, Alibaba’s Qwen team — will continue to compete on benchmark scores and reasoning quality. But the companies that capture the most durable economic value will be the ones that own the physical layer: the chips, the data centers, the networking, the storage, and the pricing power that comes from being the only game in town when every enterprise on the planet needs more tokens than yesterday. Alibaba and Baidu just proved that in China, at least, they are that game. The rest of the world is watching to see how long the advantage lasts.

In other news

Pentagon doubles down against Anthropic in court filing — The Department of Defense told a federal court that Anthropic’s corporate “red lines” on autonomous weapons and mass surveillance make it an “unacceptable risk” to national security, arguing the AI company could “disable its technology or preemptively alter the behavior of its model” during warfighting operations. Anthropic sued the Trump administration in early March after being designated a supply chain risk, and several tech companies including OpenAI, Google, and Microsoft have filed amicus briefs in its support.

Nvidia restarts H200 production for China — Jensen Huang confirmed that Nvidia is firing up manufacturing lines for H200 chips bound for Chinese customers after securing multiple export licenses, but shipments face a 25 percent duty, mandatory U.S. inspections, and a proposed cap of 75,000 chips per customer. The move follows months of regulatory uncertainty since the Trump administration first blessed H200 sales to China in December.

Mistral drops Small 4 under Apache 2.0 — Mistral AI released Mistral Small 4, a 119-billion-parameter mixture-of-experts model with 128 experts and only 6 billion active parameters per token, under the Apache 2.0 license. The model unifies instruction following, reasoning, multimodal understanding, and agentic coding into a single deployment, cutting latency 40 percent versus Small 3 while exposing a configurable reasoning_effort parameter for latency-accuracy tradeoffs.

AWS partners with Cerebras for 5x faster inference — Amazon Web Services and Cerebras Systems announced a collaboration deploying CS-3 wafer-scale systems on Amazon Bedrock, using a disaggregated architecture that pairs AWS Trainium for prefill with Cerebras WSE for decode. The partnership claims a 5x boost in high-speed token throughput and is expected to launch within the next couple of months.

Bloomberg asks whether the AI bubble is set to burst — Three years into the boom, a Bloomberg analysis notes that Big Tech’s data center investments are on track to hit $500 billion in 2026 while Moody’s has modeled a scenario in which AI-related company valuations fall 40 percent. The piece frames the central tension: AI is already coding apps and drafting contracts, but the money being spent on the technology has ballooned into “a vast liability hanging over financial markets.”