Karpathy Stopped Coding. The Rest of Us Are Next. • Stephen Van Tran

Andrej Karpathy — founding member of OpenAI, former head of Tesla’s Autopilot AI, Stanford PhD, and the person who literally coined the term “vibe coding” — went on the No Priors podcast last week and said something that should stop every software engineer mid-keystroke: he hasn’t typed a line of code since December. Not because he burned out. Not because he pivoted to management. Because AI agents write all of it now. “Code’s not even the right verb anymore,” he told host Sarah Guo. “I have to express my will to my agents for 16 hours a day.” The ratio flipped fast — from 80% human-written code and 20% agent-delegated, to the inverse, in a matter of weeks. The man who taught a generation of engineers how neural networks work through his Stanford CS 231n lectures is now, by his own admission, in “a state of psychosis of trying to figure out what’s possible, trying to push it to the limit.”

That word — psychosis — is doing heavy lifting. Karpathy is not describing a breakdown. He is describing the vertigo of watching the floor disappear beneath a skill he spent decades mastering, and discovering that the fall feels a lot like flight. “This is why it gets to the psychosis,” he explained, “is that this is like infinite and everything is skill issue.” Every limitation he encounters with his AI agents feels solvable — not by writing better code, but by prompting better instructions. The bottleneck has shifted from the machine’s capability to the human’s imagination. And if that is true for Karpathy, who arguably understands large language models better than almost anyone alive, it carries implications for the 30 million professional developers worldwide who are still typing functions by hand.

What makes Karpathy’s confession credible rather than performative is the receipts. He is not philosophizing about a hypothetical future. He built the tools, ran the experiments, and published the results. The data is specific, reproducible, and alarming in the best possible way.

Sixteen hours of will, zero lines of code

The centerpiece of Karpathy’s recent work is autoresearch, a 630-line Python script that lets AI agents run autonomous machine learning experiments on a single GPU. The concept is deceptively simple: give an agent a small but real LLM training setup, let it modify the training code and neural network configuration, run each experiment for five minutes, check whether the result improved, keep or discard the change, and repeat. Karpathy let it run for two days straight. The agent conducted 700 different experiments and discovered 20 optimizations that, when stacked together and applied to a larger model, produced an 11% speedup in training time. The repository hit 34,000 GitHub stars in under a week. “All LLM frontier labs will do this,” Karpathy declared. “It’s the final boss battle.”

The results were not confined to Karpathy’s own hardware. Shopify CEO Tobias Lutke ran autoresearch overnight on internal company data and reported 37 experiments conducted autonomously, a 19% improvement in validation score, and a 0.8 billion parameter model that now outperformed the 1.6 billion parameter model it was meant to replace. A CEO — not an ML researcher — achieved publishable results by running a script overnight. The implications for staffing, for research velocity, and for the competitive moat of technical expertise are hard to overstate.

But autoresearch is only one artifact of Karpathy’s broader shift. He also built “Dobby the House Elf claw” — an AI agent that controls his home’s sound system, lighting, security cameras, window shades, HVAC, pool, and spa, all through natural language commands over WhatsApp. The agent can detect a FedEx delivery truck via security camera and send an alert. “I can’t believe I just typed in like, can you find my Sonos? And that suddenly is playing music,” he said. “I only typed three prompts!” Jensen Huang personally hand-delivered a DGX Station to Karpathy’s home to power it — a machine with up to 20 petaflops of performance and 748 GB of coherent memory, dedicated to running a house elf.

The pattern across all of Karpathy’s projects is the same: remove the human as the bottleneck. “To get the most out of the time tools that have become available now, you have to remove yourself as the bottleneck,” he said on the podcast. His vision for autoresearch is not a single agent tuning hyperparameters but multiple agents exploring different optimizations in parallel — “to emulate a research community” rather than a single researcher. The programmer does not write code. The programmer does not even supervise code. The programmer designs the evaluation function and lets the agents compete.

The comparison to earlier approaches matters. Critics pointed out that autoresearch resembles AutoML and neural architecture search, techniques that have existed for years. Karpathy was blunt in his rebuttal: his system, powered by LLMs that can read research papers, learn from prior experiment logs, and form novel hypotheses in natural language, is “totally useless by comparison” to the older grid-search methods. The difference is legibility. A traditional hyperparameter sweep explores a predefined space. An LLM-driven agent can read a paper published yesterday, hypothesize that its findings apply to the current architecture, modify the code to test the hypothesis, and evaluate the result — all without human intervention. The 8.6 million views that Karpathy’s autoresearch post garnered in two days suggest the developer community understands the distinction intuitively.

On the one-year anniversary of coining “vibe coding,” Karpathy declared it passé. The successor term is “agentic engineering” — “agentic because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight.” The distinction matters. Vibe coding was playful, experimental, disposable. Agentic engineering is the claim that this way of working scales to production systems without compromising quality. It is the difference between a weekend hack and an architecture decision.

The data says he’s early, not crazy

Karpathy’s personal testimony would be easy to dismiss as the enthusiasms of an outlier if the industry data did not corroborate every claim. The JetBrains State of Developer Ecosystem 2025 survey of 24,534 developers across 194 countries found that 85% now regularly use AI tools for coding, with 62% relying on at least one AI coding assistant or agent as part of their daily workflow. The Stack Overflow 2025 Developer Survey confirmed the trend from the other direction: 84% of respondents are using or planning to use AI tools in their development process, and 51% of professional developers use them daily. These are not early-adopter numbers. This is the mainstream.

The raw output metrics are equally striking. Microsoft CEO Satya Nadella has reported that 46% of code written by GitHub Copilot users is now AI-generated, with Java developers reaching a 61% generation rate and developers keeping 88% of what Copilot produces. GitHub Copilot itself has crossed 20 million cumulative users with 4.7 million paid subscribers — a 75% year-over-year increase. Google CEO Sundar Pichai disclosed that over 25% of new code at Google is generated by AI and then reviewed by engineers. Uber’s internal AI agent now generates approximately 1,800 code changes weekly, with 95% of its engineers using AI tools monthly and AI-driven code changes climbing from under 1% to 8% of total output. The trendlines are not subtle.

The ecosystem beyond Copilot is exploding in parallel. The OpenClaw movement — open-source autonomous AI agents that run locally on consumer hardware — has accumulated over 180,000 GitHub stars since its January launch, with Mac minis selling briskly as dedicated agent hosts. Nvidia responded with its own NemoClaw platform targeting enterprise deployment. The AI coding tools market was valued at $7.37 billion in 2025 and is projected to reach $30.1 billion by 2032 — a 27.1% compound annual growth rate that reflects not speculative hype but measured enterprise procurement.

The enterprise adoption curve for AI agents beyond coding is steepening in parallel. Gartner projects that 40% of enterprise applications will feature task-specific AI agents by the end of 2026, up from less than 5% in 2025. A G2 survey from August 2025 found that 57% of companies already had AI agents in production, with 66% of adopters reporting measurable value. The broader AI agent market is projected to reach $103.6 billion by 2032 at a 45.3% compound annual growth rate. These numbers describe an infrastructure shift, not a fad.

Stitching these data points together yields a proprietary estimate: if 85% of the world’s approximately 30 million professional developers are regularly using AI coding tools, and those tools are generating roughly 40-46% of their code output, then AI agents are now responsible for producing the equivalent output of 10 to 12 million human developers annually. That is not a productivity enhancement. It is a shadow workforce larger than the entire developer population of the European Union. The economic repricing has already begun — Glen Rhodes, analyzing Karpathy’s podcast appearance, put it bluntly: “When the cost of producing working code drops toward zero, everything that was priced based on code-production cost gets repriced.”

The 10-year-old PhD problem

Karpathy is not blind to the failure modes. In the same podcast where he described his psychosis, he offered the most precise articulation of AI’s current limitation that any practitioner has produced: “I simultaneously feel like I’m talking to an extremely brilliant PhD student who’s been like a systems programmer for their entire life and a 10-year-old.” In humans, he noted, these capabilities would be “a lot more coupled” — you would never encounter a systems architect who also cannot remember what you told them five minutes ago. But that is exactly what agentic engineering demands you manage. The brilliance and the stupidity arrive in the same package, and the operator must develop a finely calibrated intuition for when to trust and when to verify.

The code quality data supports his caution. A December 2025 analysis found that AI co-authored code contained approximately 1.7 times more “major” issues than human-written code, with a 2.74 times higher rate of security vulnerabilities. Google’s Addy Osmani, engineering lead on Chrome, has written extensively about the distinction between vibe coding and agentic engineering, and his central warning is stark: AI-assisted development “creates a generation of developers who can prompt but can’t debug.” The irony is that agentic engineering disproportionately benefits senior engineers with deep fundamentals — the people who least need the help — because they possess the judgment to catch the 10-year-old’s mistakes while leveraging the PhD’s speed.

The security surface area compounds the quality problem. When Kaspersky audited the OpenClaw ecosystem in late January, it found over 500 vulnerabilities, including eight rated critical. The primary attack vector is prompt injection — malicious inputs that hijack an agent’s behavior. The OpenClaw social experiment Moltbook, where AI agents autonomously post and comment while humans observe, suffered a critical database breach within days of launch. Remote code execution vulnerabilities, supply chain poisoning through malicious skills, and compromised instances were reported across the ecosystem almost immediately. Karpathy himself expressed reservations about the very tools his enthusiasm helped popularize: “I’m definitely a bit sus’d to run OpenClaw specifically — giving my private data/keys to 400K lines of vibe coded monster that is being actively attacked at scale is not very appealing at all.” OpenAI’s acquisition of Promptfoo — a prompt injection testing framework — in early March was a tacit admission that the agent security problem is far from solved. The pattern is familiar from every previous platform shift: capability outpaces security, adoption outpaces governance, and the bill comes due after the infrastructure is already deployed.

There is also a hard boundary that Karpathy articulates with rare intellectual honesty. Autoresearch works because training loss is a clean, measurable signal — the agent can objectively determine whether each experiment made things better or worse. “If you can’t evaluate it, then you can’t auto research it,” he said. This constraint rules out the vast majority of software engineering work, where success is defined by user experience, business logic, architectural elegance, and edge cases that no metric captures. The 63% of vibe coding users who identify as non-developers are building user interfaces and personal tools where evaluation is subjective and the stakes are low. Extending that approach to banking infrastructure, medical devices, or autonomous vehicle systems would be reckless. The domain of “verifiable” tasks where agents excel is expanding, but it is not infinite, and confusing its current size with its ultimate boundary is the kind of mistake that gets people hurt.

The skill stack that survives the phase shift

Karpathy’s most provocative claim is also his most defensible: “All unverifiable domains still belong to humans; all verifiable domains either already belong to machines or will soon belong to them.” This is not a prediction about the distant future. It is a description of March 2026. The question for every developer, engineering manager, and CTO is not whether this shift is happening — the data is unambiguous — but what the optimal response looks like before the repricing reaches their specific domain.

The labor market data Karpathy himself produced offers a map. In a weekend project, he scored 342 U.S. occupations on AI exposure using Bureau of Labor Statistics data. Jobs paying over $100,000 per year averaged a 6.7 exposure score out of 10. Jobs paying under $35,000 averaged 3.4. Software developers, data scientists, financial analysts, and paralegals — the knowledge workers who assumed automation would hit factory floors first — scored 9 out of 10. Construction laborers and roofers scored 1. Nursing assistants, massage therapists, and bartenders scored 2. The analysis was crude enough that Karpathy retracted it, calling it a “saturday morning 2 hour vibe coded project.” But the directional finding aligns with every serious study of AI labor displacement: the higher the salary, the more the job involves manipulating information rather than atoms, the greater the exposure.

Osmani’s framework for navigating this shift is the most actionable one available. The single biggest differentiator between agentic engineering and vibe coding, he argues, is testing. “Vibe coding equals YOLO. Agentic engineering equals AI does the implementation, human owns the architecture, quality, and correctness.” The skill that compounds in value is not code production — agents commoditized that — but system design, failure mode analysis, and the judgment to know when the brilliant PhD student just handed you the 10-year-old’s homework. Engineering fundamentals do not become less important in an agentic world. They become the scarcest resource in the stack.

The organizational implications extend beyond individual skill development. Gartner’s estimate that 40% of enterprise apps will feature AI agents by the end of this year means engineering teams need to rethink how they structure work, how they evaluate performance, and how they allocate headcount. A team of five engineers directing 20 AI agents is not the same as a team of five engineers writing code. The management layer shifts from code review to output evaluation, from sprint planning to agent orchestration, from hiring for language expertise to hiring for domain judgment. Companies that reorganize around this reality will move faster than those that bolt agents onto existing workflows and hope for a productivity bump.

Karpathy ended his interview with a confession that lands differently when you know his resume: “I still think I might explain things slightly better than intelligent agents, but this feels like a losing battle.” The man who built Tesla’s self-driving AI, who co-founded the company that created GPT-4, who runs an education startup predicated on human teaching — even he is not sure his edge will last. “If we succeed,” he said, recalling a line from his OpenAI days, “we will all be unemployed.”

The operator checklist for the next 90 days:

Measure your verification surface. Audit every engineering workflow and classify tasks as verifiable (clear success metrics, automated testing, measurable output) or unverifiable (design judgment, user experience, architectural trade-offs). Agents should own the first category. Humans should own the second. The boundary between them is your strategic frontier.
Run autoresearch on your own stack. Karpathy’s tool is open source and runs on a single GPU. If your ML training pipeline cannot benefit from 700 autonomous experiments, you are either already optimal or not measuring correctly. Fork the repo, point it at your training script, and let it run overnight.
Invest in evaluation infrastructure. The limiting factor for agentic engineering is not agent capability but evaluation quality. If you cannot programmatically determine whether a code change improved your system, you cannot delegate that change to an agent. Build better tests, better benchmarks, better monitoring — these are the assets that compound.
Upskill for review, not production. The developer who spends 2026 learning a new programming language is investing in a depreciating asset. The developer who spends 2026 learning to read, evaluate, and correct AI-generated code at speed is investing in the skill that every engineering organization will bid for in 2027.
Take the security tax seriously. Every agent you deploy is an attack surface. Every tool you connect is a privilege escalation vector. Budget for prompt injection testing, sandboxed execution environments, and the assumption that your agent will be compromised. The Kaspersky audit of OpenClaw found 500 vulnerabilities in a project with 180,000 stars. Your internal tools are not more secure.

Karpathy has not lost his mind. He has seen the near future with unusual clarity and is reporting back with the precision of someone who spent a career training machines to see. The psychosis he describes is not madness — it is the cognitive dissonance of a world-class programmer realizing that programming, as he practiced it for 20 years, is over. The code still runs. The functions still execute. The tests still pass. But the hands on the keyboard belong to something else now, and the most important skill left is knowing whether to trust what it wrote.