Photo by James A. Molnar on Unsplash
Washington just became AI's pre-deployment regulator
/ 15 min read
Table of Contents
The morning every frontier lab quietly clocked in
Tuesday morning, the federal government completed a clean sweep that almost nobody outside Washington has fully metabolized yet. The Center for AI Standards and Innovation, the small NIST outfit better known to insiders as CAISI, announced signed agreements with Google DeepMind, Microsoft, and xAI to perform pre-deployment evaluations of their frontier models. The same press release confirmed that OpenAI and Anthropic had renegotiated the partnerships they first signed in 2024 to align with the Trump administration’s AI Action Plan. With those five signatures, every major American frontier-model developer is now feeding its unreleased systems to the same federal evaluator. The voluntary era of US AI safety oversight is over. What replaced it is voluntary in name only.
The numbers tell you why this matters. CAISI says it has completed more than forty model assessments since standing up — many of them on systems still in the training oven, some shipped with reduced or fully removed safeguards so the evaluator can see the underlying capability rather than the deployment-ready surface. The agreements explicitly cover testing in classified environments and were drafted, in the agency’s words, with the flexibility to respond to rapid AI advancement. That is not the language of a one-off pilot. That is the language of a regime building its operating manual in public.
The trigger for the policy pivot was not a paper or a panel. It was an unreleased Anthropic model. In early April the company restricted Claude Mythos Preview to a forty-organization launch cohort under the banner of Project Glasswing — a deliberate refusal to ship its most capable cyber-offensive model to the open market because, as the company’s own technical preview admits, the model has already turned up thousands of high-severity zero-days, including a seventeen-year-old remote code execution flaw in FreeBSD that it then chained into a fully autonomous exploit. Bruce Schneier, in his public read of the Mythos disclosure, called the moment a discontinuity in the security stack. The White House clearly read it the same way. I covered the lab side of that pivot in the week AI became too dangerous to ship freely; this week is the regulator’s response.
It would be a mistake to read this as the labs giving something up. They are buying something. By signing on first and shaping the playbook, the five incumbents lock in a pre-deployment process they can comply with — one whose costs they can absorb and whose timelines they can negotiate. Smaller open-weight competitors and Chinese labs, the ostensible targets of any future executive order, have no such seat at the table. The agreements are the soft launch of a moat dressed in safety language. Tuesday’s announcement did not just give Washington a window into frontier AI. It gave the five biggest American AI labs a formal lane between themselves and everyone else who might want to ship.
The structural break from prior arrangements is sharper than the press release admits. Biden-era AI safety governance ran through the original USAISI charter at NIST, which framed evaluation as a research collaboration with developer-supplied access on a per-model basis and no operational link to the intelligence community. The new CAISI version flips both of those defaults. Access is a standing agreement, not a per-model handshake. And the TRAINS Taskforce loops findings to the NSA and other intelligence-community consumers as a matter of process. That is a different administrative animal — not just relabeled paperwork — and the speed with which the labs accepted the shift suggests they see the writing on the wall. A future executive order with statutory teeth is easier to negotiate when you are already an inside party than when you are watching from the sidelines.
Forty evaluations, a TRAINS Taskforce, and one FDA analogy
Strip the press release back to its mechanics and the model becomes legible. CAISI runs pre-deployment evaluations on still-unshipped systems, post-deployment assessments on already-public ones, and feeds findings into the interagency TRAINS Taskforce — a loop that pulls in the National Security Agency and other parts of the intelligence community. The agency’s stated focus is national-security-grade harm: cybersecurity capability, biosecurity, chemical-weapons risk, and the detection of foreign-built systems with covert behaviors or backdoors. Director Chris Fall framed the new agreements as a way to “scale our work in the public interest at a critical moment.” Under the hood, the work is closer to a grey-box capabilities evaluation than a compliance audit, which is why developers ship CAISI variants of their models with safeguards stripped out. The agency wants to see the model, not the marketing.
The political superstructure around CAISI has hardened in parallel. The agreements were renegotiated under direction from Commerce Secretary Howard Lutnick, who in March was given authority to designate CAISI as the government’s primary point of contact with industry on commercial AI testing. National Economic Council director Kevin Hassett added the operative analogy on May 4, saying the administration is studying an executive order so that frontier models are “released in the wild after they’ve been proven safe, just like an FDA drug.” That quote is doing a lot of work — and signals, more clearly than any redacted brief, where this is going. National Cyber Director Sean Cairncross is reportedly the inside coordinator. Together they are building, brick by brick, a pre-release approval regime that does not yet exist in statute but is already operating in practice.
What does the FDA analogy actually buy you? The Trump administration’s AI Action Plan, released last summer and codified in March’s national AI legislative framework, rests on three pillars: accelerate domestic innovation, build out infrastructure, and lead in international diplomacy and security. Stanford’s HAI summarized the package in its initial review as innovation-first, with safety treated as a national-security overlay rather than a consumer-protection floor. Skadden’s legal-strategic read flagged the same tension: industry got procurement preference and federal-permitting acceleration; safety got language about “objective” evaluation. Tuesday’s CAISI announcement is the first concrete safety mechanism that has shipped under the new regime. The FDA analogy is its rhetorical engine. The labs’ renegotiated terms are the chassis. Whether the engine fits the chassis — whether you can run a drug-style approval framework on probabilistic systems that are different in deployment than in test — is the open question of the next twelve months.
Here is the proprietary takeaway worth carrying with you. Stack the regulator against the regulated and the asymmetry is staggering. CAISI has roughly thirty staff and has received about thirty million dollars in total funding since its 2024 standup, per Federal News Network’s reporting on the underfunding of the agency. Anthropic alone is currently negotiating a $50 billion private round at a near-$900 billion valuation, with annual revenue run rate above $30 billion. CAISI’s entire two-year budget is roughly 0.06 percent of Anthropic’s pending raise — and roughly 0.01 percent of the $300 billion that flowed into AI startups in Q1 2026 per the latest Stanford AI Index reporting I covered last month. Thirty federal employees are now the formal pre-deployment gate for a sector deploying capital five orders of magnitude larger than the gate itself. That is not how regulatory regimes succeed. That is how they get captured.
The cracks in the new gatekeeper
Now the part that the announcement does not advertise. The most cited critique of Tuesday’s news came not from the safety left but from the policy right. Dean Ball, who authored the original Trump-team AI policy memos, wrote in his newsletter that “the current trajectory of federal frontier AI governance is worse than the direction of AI policy under the Biden administration.” Worse, by his read, because the Biden-era Voluntary Commitments at NIST kept evaluation civilian and consumer-protection-flavored, while the Trump version has pulled the NSA and broader intelligence community into pre-release model review. The Techdirt analysis, which is more polemical but factually careful, points out that Marc Andreessen called the Biden framework tyranny “far beyond anything even imagined by the Communists and Fascists of the 20th Century” — and got, in exchange for his vote, a stricter version of the same thing. The libertarian wing of the AI lobby is finding out that “voluntary” oversight has a way of hardening once a sufficiently scary capability lands on the regulator’s desk.
The funding gap is more concrete and more damning. Canada’s parallel institute, the Canadian AI Safety Institute, launched with $50 million Canadian over five years — roughly $10 million per year of dedicated capacity, plus a separate research budget routed through CIFAR. Singapore’s AI safety institute at NTU operates on roughly S$10 million per year of recurring support. Both numbers are larger annually than CAISI’s total budget since standup. The math gets uglier when you set CAISI’s headcount of about thirty against the model count it is supposed to evaluate: five labs, multiple model families per lab, frequent checkpoint releases, and at minimum quarterly capability shifts on each. CAISI’s staff would need to evaluate roughly one model every two weeks, in classified environments, with reduced safeguards, across cyber, bio, and chemical threat surfaces. Real evaluations of frontier AI take weeks per model in adequately resourced red teams. The math does not pencil. Either Congress funds the agency to ten or twenty times its current size, or the evaluations are theater.
There is a deeper structural objection that the FDA analogy is doing more rhetorical work than it can carry. Daniel Carpenter and his Harvard collaborators laid this out in the most-cited academic critique of FDA-for-AI. Approval regulation links mandatory pre-market testing to a regulatory veto over R&D — a structure that fits drugs because the regulated product is well-defined, the harms are biologically constrained, and the testing protocols are mature. None of those conditions hold for frontier AI. The product moves between training, fine-tuning, deployment, and post-deployment patching. The harms are emergent, transmissible (a system fine-tuned on a bad prompt becomes a different product), and distributed across actors. Even in pharma, post-marketing study commitments are honored only about 31% of the time five years out — and that is in a sector with two centuries of regulatory infrastructure. Importing the FDA chassis into a sector that has neither the product definition nor the institutional muscle is, at best, a placeholder. At worst, it is a marketing label slapped on a process that cannot bear the weight.
The selection bias is the part nobody is talking about. CAISI evaluations cover American frontier labs that voluntarily submit. They do not cover DeepSeek V4 or the wave of capable Chinese open-weight models that any threat actor can download and fine-tune. They do not cover open-weight Llama derivatives or whatever Meta ships next. They do not cover the Mistral and AMI Labs systems Europe is betting its sovereign-AI strategy on. If the goal is preventing a Mythos-class capability from reaching adversaries, gating five American labs while open-weight competitors keep shipping is a bit like locking the front door of a house with no walls. The agreements are useful. They are not, by themselves, a defense.
The information-sharing architecture is the other quiet weakness. CAISI evaluations are confidential by default — appropriate for classified threat surfaces, problematic for the public-trust function a regulator is supposed to serve. There is no public registry of evaluated models, no published red-team results, no statutory disclosure to Congress of capability findings unless the executive branch chooses to share them. The Computerworld coverage of the agency’s expanding remit noted that civil-society and academic researchers have effectively been cut out of the loop that existed under the Biden-era voluntary commitments. That trade-off may be the right one for genuine national-security risk; it is the wrong one if the regulator becomes an industry-capture vector. The line between “secure enough to keep classified” and “convenient to keep classified” is exactly where regulatory regimes have historically failed the publics they ostensibly serve.
From voluntary handshake to mandatory floor
Where does this lead by Q4? The honest read is that Tuesday’s announcement is the soft prelude to an executive order. Hassett’s FDA analogy is too specific to be hypothetical, and the Pentagon language about safety-testing models for federal, state, and local government use suggests that the procurement gate goes mandatory before the consumer-deployment gate does. Expect the order in the second half of 2026, layered on top of CAISI’s existing voluntary infrastructure rather than replacing it. Expect Congress to either fund CAISI to multiples of its current size or to discover, around the time of the first major deployment failure, that the gate it built was too narrow to inspect what it was approving. Expect the labs to leverage their early-mover compliance posture into a “trusted developer” status that becomes the de facto qualification for federal contracting and, eventually, for any large-enterprise deployment. The story of the next twelve months is not whether AI gets regulated. It is who is inside the tent when the rules harden, and Tuesday told you the answer.
For builders, deployers, and policy professionals, here is the operator checklist worth pinning to the wall:
- If you ship a frontier-class model: Document your pre-deployment safety evaluations now in the format CAISI uses (cyber, bio, chemical, with safeguard variants). The five labs have already started. The cost of catching up under an executive-order timeline is materially higher than the cost of preparing voluntarily.
- If you build on top of frontier models: Add a “CAISI-evaluated” check to your vendor-selection criteria for any deployment that touches regulated industries or federal procurement. The status is going to be load-bearing within a year. Your security and compliance teams should already be asking the question.
- If you operate in the open-weight or fine-tuned-derivative segment: Plan for the day when your fine-tunes are treated as new products under whatever executive order lands. The current regime evaluates the base model. The next one will evaluate the deployment, and “I just downloaded the weights” will not be a defense.
- If you advise on AI policy or run a public-affairs function: CAISI is going to need ten to twenty times its current staffing to meet the workload it has been handed. The funding fight is the leading indicator of whether the regime is real or theatrical, and the appropriations cycle starts now.
- If you are a security or red-team lead: Project Glasswing is the template for what credible offensive-AI testing looks like, and the next executive order will likely require something resembling it from any developer above a capability threshold. The internal artifacts you build today — capability evals, threat surfaces, kill chains — become tomorrow’s compliance evidence.
- If you are a CISO at a large deployer: The combination of a pre-deployment federal gate plus the forty-billion-dollar Google–Anthropic infrastructure deal means that hyperscaler concentration and federal oversight are now the same conversation. Your AI vendor risk model needs both axes, not just one.
The deeper point is that May 5, 2026 is the date the United States stopped pretending that frontier AI was a self-regulating sector. The administration that came in promising to dismantle Biden-era AI guardrails has, eight months in, built a tighter pre-deployment gate than the one it inherited — and gotten the entire frontier-lab tier to walk through it voluntarily. The rhetoric calls it innovation. The mechanics call it the FDA. The math calls it underfunded. All three are true. What is not yet true, and what the next twelve months will decide, is whether the gate is wide enough to inspect what it has been asked to approve, and whether the public that benefits from the approval regime has any meaningful seat in deciding what “approved” means.
If you build, deploy, or regulate AI — and increasingly that is everyone — Tuesday was a turning point. The labs already understood it. Now you do too.
In other news
EPAM goes all-in on Claude with a 10,000-architect plan — On May 6, EPAM Systems announced a multi-year strategic partnership with Anthropic under which the consultancy will certify more than 10,000 Claude architects, including 250 forward-deployed “Black Belt” engineers. EPAM has already certified 1,300 architects and trained 20,000 employees on Claude tooling — a service-firm bet that enterprise AI delivery is now the chokepoint, not model access.
Anthropic eyes a $900B price tag on the same week as the CAISI deal — TechCrunch reports Anthropic could close a $50 billion private round at a valuation of roughly $900 billion within two weeks, surpassing OpenAI’s $852 billion valuation from earlier this year. Annual revenue run rate has reportedly crossed $30 billion en route to $40 billion. The juxtaposition with CAISI’s $30 million two-year budget is the cleanest illustration of regulator-versus-regulated asymmetry on offer.
Parag Agrawal’s web infrastructure for AI agents hits $2B — Parallel Web Systems, the search-and-research API platform built specifically for AI agents, raised $100 million Series B at a $2 billion valuation led by Sequoia. The round closed five months after a $100 million Series A at a $740 million mark, with customers including Notion, Harvey, Clay, and Opendoor. Agent infrastructure, not agent UX, is where the next round of pricing power is settling.
Google’s “Remy” agent surfaces inside Gemini — 9to5Google flagged that Google is dogfooding a personal agent codenamed Remy inside an employee build of the Gemini app, positioned as a 24/7 assistant that integrates with Gmail, Calendar, Docs, Drive, Keep, and Tasks. The internal framing — “personal agent for work, school, and daily life” — reads like the consumer counter to whatever Anthropic and OpenAI ship next on the agent front.
Meta’s frontier reset hits the public market — In an underreported beat from a few weeks back, CNBC noted Meta debuted its first major AI model since the $14 billion Alexandr Wang deal and is racing Google to ship a personal-agent layer before Anthropic and OpenAI extend their lead further. Meta’s renewed urgency on the frontier is, paradoxically, the strongest validation of the “every frontier lab in CAISI” framing — there is no fifth-place spot to occupy.