skip to content
Stephen Van Tran
Table of Contents

The chip embargo just failed its biggest test

The entire strategic logic of American export controls rests on a single premise: without access to NVIDIA’s most advanced GPUs, Chinese AI labs cannot build frontier models. DeepSeek just obliterated that assumption. Reuters confirmed on April 4 that DeepSeek V4 — a trillion-parameter mixture-of-experts model with a one-million-token context window and native multimodal generation across text, image, and video — will run entirely on Huawei’s Ascend 950PR chips. Not as a fallback. Not as a proof of concept. As the primary inference and deployment platform for what may be the most consequential open-weight model release in the history of artificial intelligence.

The numbers alone demand attention. DeepSeek V4 activates roughly 37 billion of its one trillion total parameters per token through sparse mixture-of-experts routing, which means it runs more like a 37B model at inference time while drawing on a knowledge base twenty-seven times larger. Internal benchmarks claim 81 percent on SWE-bench Verified and 90 percent on HumanEval — numbers that would place it within striking distance of Claude Opus 4.6 and GPT-5.4 on coding tasks. Its API pricing lands at approximately $0.28 per million input tokens, which is roughly twenty-seven times cheaper than GPT-5.4 and fifty times cheaper than Claude Opus on input. The model will ship under an Apache 2.0 license, meaning any company on the planet can deploy it commercially without paying a cent in licensing fees. A frontier-class model, running on domestically produced Chinese silicon, priced at a fraction of Western alternatives, and given away for free. Every clause in that sentence should alarm policymakers in Washington.

The geopolitical implications are not subtle. DeepSeek withheld early access to V4 from American chipmakers including NVIDIA and AMD, while granting Chinese suppliers like Huawei additional time to tune their software stacks for the model. That is not the behavior of a company hedging its bets. That is the behavior of a company building a parallel ecosystem — one where American semiconductor dominance is not a prerequisite for frontier AI capability but an optional convenience. If V4 delivers anything close to its promised performance, the question facing the United States is no longer whether export controls can slow China’s AI progress. The question is whether they ever could.

Consider the timeline. The original export restrictions on advanced AI chips were introduced in October 2022, tightened in October 2023, and expanded again in early 2025 to cover additional chip architectures and cloud computing access. In the thirty months since the first restrictions took effect, China has not slowed down. It has accelerated. DeepSeek released R1 in early 2025, matching GPT-4 performance at a fraction of the cost, and the company has now followed that with a model that claims frontier-level capability on entirely domestic hardware. Huawei, for its part, has expanded Ascend chip production to meet the demand generated by exactly the scenario export controls were supposed to prevent: Chinese companies designing around American technology rather than being constrained by its absence. The export control regime assumed that China’s alternatives would be inferior for years, perhaps decades. DeepSeek V4 suggests the actual timeline was closer to eighteen months.

Follow the silicon, find the strategy

Understanding what DeepSeek accomplished requires unpacking both the architectural choices and the hardware constraints that shaped them. The Ascend 950PR is Huawei’s flagship AI accelerator, designed specifically to compete with NVIDIA’s A100 and H100 in training and inference workloads. It does not match NVIDIA’s latest Blackwell chips on raw floating-point throughput or memory bandwidth — no credible analyst claims otherwise. But DeepSeek’s engineering team appears to have compensated through aggressive algorithmic optimization, training the model with a technique the company calls Engram Conditional Memory to enable the million-token context window, and publishing a separate paper on Manifold-Constrained Hyper-Connections that addresses training stability at trillion-parameter scale. The inference cost savings come from the MoE architecture itself: activating only 37 billion parameters per forward pass means V4 requires a fraction of the compute that a dense trillion-parameter model would demand.

The pricing differential is where this story transforms from an engineering curiosity into an economic earthquake. At $0.28 per million input tokens, DeepSeek V4 is not merely cheaper than its American rivals. It occupies a different economic category entirely. GPT-5.4 charges approximately $2.50 per million input tokens. Claude Opus 4.6 sits even higher. A startup building an AI-native product on DeepSeek V4 would spend roughly $10,000 on API calls that would cost $270,000 on GPT-5.4 — a difference large enough to determine whether an entire company is viable. For inference-heavy workloads like coding assistants, document processing, and customer support automation, that cost gap does not just change unit economics. It changes which products can exist.

The open-weight strategy amplifies the disruption. Google DeepMind released Gemma 4 under Apache 2.0 just last week, proving that open models can compete with closed frontier systems at smaller parameter counts. DeepSeek is attempting the same proof at an order of magnitude greater scale. If V4’s weights ship under Apache 2.0 as promised, any organization with sufficient hardware — and Huawei is aggressively supplying that hardware across Asia, the Middle East, and Africa — can run a frontier model without any commercial relationship with an American company. The demand signal is already visible: Alibaba, ByteDance, and Tencent have collectively ordered hundreds of thousands of Huawei Ascend chips ahead of the V4 release, creating the infrastructure for a self-sustaining Chinese AI ecosystem that operates entirely outside the reach of US trade policy.

The competitive dynamics are also worth mapping precisely. DeepSeek V4, Claude Opus 4.6, and GPT-5.4 are converging on similar capability thresholds — million-token context windows, strong coding performance, multimodal understanding — but they are arriving there through radically different economic models. OpenAI and Anthropic are racing toward IPOs that could collectively raise $150 billion while burning cash at extraordinary rates, funding their compute through venture capital and enterprise subscriptions priced at premium margins. DeepSeek is funded by the Chinese quantitative trading firm High-Flyer, operates with a reported training budget of approximately $5.2 million for the base model, and plans to give the weights away. These are not just different business models. They are different theories about how the AI industry will consolidate, and only one of them can be correct.

Here is the clearest way to frame the divergence. OpenAI spent an estimated $22 billion in 2025 to generate $13.1 billion in revenue, burning $1.69 for every dollar earned, and projects $14 billion in losses for 2026. Anthropic burned roughly $3 billion in cash last year and has estimated $80 billion in cloud infrastructure costs through 2029. Both companies are betting that the revenue from enterprise subscriptions and API access will eventually justify the astronomical spending — a bet that requires maintaining premium pricing on their models. DeepSeek’s existence makes that bet harder to win every single day. If a model that costs twenty-seven times less offers eighty percent of the capability, the price umbrella that sustains American AI labs’ burn rates begins to collapse. Enterprise procurement teams are not loyal to any ecosystem. They are loyal to their budgets.

The cracks in the trillion-parameter claim

Before declaring export controls dead and NVIDIA irrelevant, it is worth cataloging everything that could undermine the DeepSeek V4 narrative — and the list is not short. The most glaring issue is that none of the headline benchmark numbers have been independently verified. The 81 percent SWE-bench Verified score and 90 percent HumanEval come from DeepSeek’s own internal evaluations, and the company has not yet submitted the model to third-party evaluation platforms like Chatbot Arena or the independent coding benchmark suites maintained by academic research groups. Self-reported benchmarks from Chinese AI labs have a mixed track record — not because of intentional deception, but because training and evaluation methodologies can differ in ways that inflate scores on standard benchmarks without translating to real-world performance. Until independent evaluators confirm these numbers, they should be treated as aspirational, not empirical.

The Huawei Ascend 950PR itself presents unresolved questions. While the chip is capable of training and running large models, it lacks the mature software ecosystem that NVIDIA has spent decades building around CUDA. DeepSeek’s engineers reportedly had to write significant portions of their training infrastructure from scratch to work around limitations in Huawei’s CANN software stack. That is a solvable problem over time, but it means the gap between what V4 can do on paper and what developers can actually build with it in production may be wider than the specifications suggest. Running a model is not the same as deploying a product, and deployment requires libraries, frameworks, debugging tools, and community support that the Ascend ecosystem is still constructing.

There is also the lingering question of how V4 was actually trained. A senior Trump administration official told Reuters that earlier DeepSeek models were trained on NVIDIA’s most advanced Blackwell chips using a cluster located in mainland China — a claim that, if true, would constitute a violation of US export controls. US officials opened an investigation in early 2026 into whether DeepSeek accessed restricted high-performance processors through intermediaries or shell companies. If V4 was initially trained on NVIDIA hardware and later fine-tuned or optimized for Ascend deployment, the narrative of pure Chinese semiconductor independence becomes considerably murkier. DeepSeek has not publicly addressed these allegations in detail, and the opacity of China’s chip supply chains makes independent verification difficult.

There is a fourth concern that rarely makes the headlines but matters enormously to enterprise buyers: data sovereignty and trust. DeepSeek is headquartered in Hangzhou and subject to Chinese national security laws that can compel companies to share data with government authorities. For organizations in regulated industries — banking, healthcare, defense, critical infrastructure — deploying a model from a Chinese lab, even one running on local hardware with open weights, introduces compliance risks that no amount of cost savings can offset. The European Union’s AI Act, the United States’ emerging AI executive orders, and sector-specific regulations in financial services all create legal exposure for companies that route sensitive data through models with opaque governance structures. Open weights mitigate but do not eliminate this concern, because the training data itself may contain biases or backdoors that are difficult to audit at trillion-parameter scale.

The pricing advantage also deserves scrutiny. At $0.28 per million input tokens, DeepSeek V4 may be pricing below cost to build market share — a strategy that Chinese technology companies have employed effectively in other sectors but that is not sustainable indefinitely without either subsidies or a viable path to profitability. High-Flyer, DeepSeek’s parent company, generates revenue from quantitative trading, not from AI model deployment, which means V4’s pricing may reflect a willingness to absorb losses rather than a genuine cost-of-inference breakthrough. If the Ascend 950PR’s actual efficiency lags behind NVIDIA’s latest chips by the margins that independent analysts estimate — somewhere between 30 and 50 percent on comparable workloads — then the true cost of running V4 at scale may be significantly higher than its API pricing implies.

The two-stack future and how to survive it

The release of DeepSeek V4 on Huawei silicon does not mark the end of American AI dominance. But it does mark the end of a simpler era in which dominance was guaranteed by control of a single chokepoint — NVIDIA’s GPU supply chain. Two parallel AI ecosystems are now forming with increasing clarity. One is centered on American technology: NVIDIA hardware, CUDA software, and closed or semi-open models from OpenAI, Anthropic, and Google, deployed primarily through AWS, Azure, and GCP. The other is forming around Chinese companies with Huawei chips and domestic software stacks, deploying open-weight models through Alibaba Cloud, Tencent Cloud, and Huawei Cloud, with growing adoption across markets that are either geopolitically aligned with Beijing or simply indifferent to Washington’s preferences.

NVIDIA is not standing still, of course. The company’s Vera Rubin platform entered full production ahead of schedule in March, with its NVL72 rack system delivering 3.6 exaflops of inference performance — a generational leap that will widen the raw performance gap between American and Chinese hardware when Rubin-based cloud instances become available in the second half of this year. But performance gaps matter less when the model running on inferior hardware is free and the one running on superior hardware costs twenty-seven times more. The history of technology disruption is littered with companies that had the better product but lost to the competitor with the better price.

For enterprise technology leaders, this bifurcation creates both risk and opportunity. The risk is dependency: organizations that build their AI infrastructure exclusively on one stack may find themselves locked out of markets, talent pools, or cost efficiencies available on the other. The opportunity is optionality. V4’s Apache 2.0 license means it can be deployed on any hardware, including NVIDIA’s, which means Western companies can use it as a cost-reduction tool or a negotiating lever against incumbent AI providers without any geopolitical entanglement. The model’s open weights also mean that fine-tuning for domain-specific tasks — legal analysis, medical coding, financial modeling — can be done internally without sharing proprietary data with a third-party API provider, a compliance advantage that no closed model can match.

The operator checklist for the next ninety days is straightforward but urgent. First, benchmark DeepSeek V4 against your current AI stack on your actual workloads the moment independent evaluations confirm its capability claims. Self-reported numbers are not enough to justify a migration, but if third-party results come within ten percent of the claims, the cost savings demand serious evaluation. Second, audit your AI supply chain for single-vendor dependency. If every model you deploy runs on one provider’s API and one chipmaker’s hardware, you are exposed to pricing power, geopolitical disruption, and supply shortages that a diversified approach would mitigate. Third, watch the Huawei Ascend adoption curve among Asian and Middle Eastern cloud providers. If major regional clouds begin offering V4-optimized instances at prices that undercut Western hyperscalers, the competitive pressure on NVIDIA’s pricing and OpenAI’s margins will become impossible to ignore. Fourth, revisit your open-weight model strategy. Gemma 4 proved last week that open models can match closed systems at the 31-billion-parameter scale. DeepSeek V4 is attempting the same proof at a trillion parameters. If both succeed, the economic case for paying premium prices for closed API access weakens dramatically — and the organizations that figured this out earliest will have the largest structural cost advantage in the AI era.

The policy implications extend beyond the technology sector. If the two-stack world materializes, governments will face pressure to choose sides — or to maintain access to both ecosystems at the cost of diplomatic complexity. Countries in Southeast Asia, the Middle East, and Latin America that have historically purchased both American and Chinese technology will become the contested ground, and the AI models they adopt will shape everything from surveillance infrastructure to healthcare delivery to financial regulation. DeepSeek V4 is not just a model release. It is a proof point in a larger argument about whether technological superiority can be maintained through supply chain control, or whether sufficiently motivated engineering teams will always find a way around the bottleneck.

The United States spent three years building an export control regime designed to maintain its lead in artificial intelligence by restricting China’s access to the most advanced semiconductor technology. DeepSeek just shipped a model that suggests the lead may no longer depend on the chips.

In other news

NVIDIA Vera Rubin enters full production ahead of schedule — NVIDIA announced that its next-generation Vera Rubin AI platform, featuring seven new chips including the Rubin GPU with HBM4 memory delivering up to 288GB per GPU, is now in full production. The flagship NVL72 rack system packs 72 GPUs and 36 CPUs with 3.6 exaflops of inference performance, and AWS, Google Cloud, Microsoft, and CoreWeave will deploy Rubin-based instances in the second half of 2026 (NVIDIA Newsroom).

Neuro-symbolic AI slashes energy consumption by 100x — Researchers at Tufts University published results showing a hybrid neuro-symbolic approach that uses just one percent of training energy and five percent of operational energy compared to standard visual-language-action models, while achieving 95 percent accuracy on structured manipulation tasks versus 34 percent for conventional systems. The work will be presented at the International Conference on Robotics and Automation in Vienna in May (ScienceDaily).

Salesforce transforms Slackbot into an autonomous AI coworker — Salesforce unveiled more than 30 new AI-powered features for Slackbot, including reusable AI skills, Model Context Protocol integration for cross-app orchestration with Agentforce and Zoom, and desktop monitoring that follows users outside of Slack. Free and Pro plan users receive limited Slackbot conversations starting this month, with the full feature set rolling out through mid-2026 (TechCrunch).

Meta deploys MTIA chips across data centers to diversify from NVIDIA — Meta began rolling out its MTIA 300 custom AI inference chips across its data centers, with the MTIA 400 completing testing and three additional generations planned through 2027. Analysts estimate Meta aims to have over 35 percent of its inference fleet running on in-house silicon by year-end, though the company continues to operate large NVIDIA GPU clusters alongside its custom hardware (CNBC).

AI Scientist-v2 produces first fully AI-generated peer-reviewed paper — Sakana AI released The AI Scientist-v2, an agentic system that autonomously formulates hypotheses, designs experiments, analyzes data, and writes scientific manuscripts. One of its fully autonomous papers scored above the average human acceptance threshold at an ICLR 2026 workshop, marking the first instance of an entirely AI-generated paper successfully passing peer review (arXiv).