skip to content
Stephen Van Tran
Table of Contents

Fourteen months ago, Anthropic unveiled the Model Context Protocol as “USB-C for AI” — a universal standard for plugging language models into the digital world. The pitch was irresistible: one protocol to rule every tool, every API, every data source. Companies raced to build MCP servers. OpenAI, Google, and Microsoft adopted the standard within months. The ecosystem swelled to over ten thousand servers, npm downloads surged eighty-fold, and for a brief window it seemed like MCP would become the foundational plumbing of the agentic era.

Then the invoices arrived.

In February 2026, a software engineer named Eric Holmes published a blog post titled “MCP is dead. Long live the CLI,” and the Hacker News discussion that followed — 400 points, 300 comments — crystallized a frustration that had been simmering for months. Cloudflare published data showing MCP wastes up to 81% of an agent’s context window. Perplexity’s CTO announced on stage that the company was abandoning the protocol. Benchmark after benchmark confirmed what terminal veterans had always suspected: the standard designed to empower AI agents was actually starving them, burning tens of thousands of tokens on tool definitions before the model could think a single useful thought.

The terminal — patient, composable, five decades deep — had been the answer all along.

The protocol that ate its own context window

MCP’s origin story was genuinely compelling. When Anthropic introduced the protocol in November 2024, developer David Soria Parra built it from a specific frustration: the tedium of copy-pasting context between Claude Desktop and his IDE. The protocol proposed an elegant abstraction — JSON-RPC 2.0 messages shuttled between clients and servers, with standardized schemas for tools, resources, and prompts. It solved the N×M integration nightmare where every AI model needed bespoke connectors for every external service. By late 2025, the ecosystem had exploded to over 10,850 servers and 300 clients in community directories, with npm downloads surging from 100,000 per month to more than 8 million. We explored this ecosystem firsthand when Claude Desktop Connectors launched with 500-plus tool integrations, and the promise felt tangible. Every major AI provider had signed on. The Agentic AI Foundation, housed under the Linux Foundation, assumed governance. MCP looked inevitable.

The trouble is what happens when an agent actually loads those tools into memory. Cloudflare’s engineering team published a devastating technical analysis showing that connecting five MCP servers with fifty tools dumps 30,000 to 60,000 tokens of schema definitions into the context window before the model processes a single user message. For complex agents, traditional MCP tool-calling wastes up to 81% of the available context. Each tool definition runs 550 to 1,400 tokens of parameter types, descriptions, and annotations — information the model carries as dead weight whether it invokes that tool or not. The model starts every conversation already gasping for air, its reasoning capacity consumed by a JSON catalogue of parameters it may never touch.

This was the data point that transformed simmering discontent into open revolt. Holmes’s viral essay was surgical in its critique: MCP spawns background server processes that can silently hang or timeout at the TCP level, offers all-or-nothing permission scopes instead of granular allowlisting, and is “unnecessarily opinionated about authentication” in ways that conflict with existing enterprise SSO infrastructure. His core claim was blunt — large language models are already skilled enough to use git, docker, and kubectl directly, reading their documentation and composing commands without an intermediary protocol interpreting their intent. Every layer of abstraction between the model and the tool is friction that costs tokens, introduces failure modes, and slows execution.

The corporate defections followed swiftly. At the Ask 2026 conference on March 11, Perplexity CTO Denis Yarats announced the company was pivoting away from MCP toward traditional APIs and command-line interfaces. He cited two concrete failures: context window consumption that compounds destructively over long multi-turn conversations, and authentication flows so convoluted they frustrated even senior engineering teams. Perplexity still maintains a legacy MCP server for backward compatibility, but their flagship product — the new Agent API — is a single REST endpoint with one API key. The kind of interface that predates MCP by a decade and works on the first try.

The irony cuts deep. MCP was built to solve the integration problem. Instead it became the integration problem — a protocol that consumes the very resource it was designed to help agents use efficiently. Developers spent 2025 building MCP servers for every tool under the sun, only to discover in 2026 that those servers were eating their agents alive.

The receipts don’t lie

The strongest indictment of MCP isn’t philosophical — it’s arithmetic. ScaleKit ran 75 head-to-head benchmarks comparing MCP and CLI approaches for identical agent tasks, and the results were not close. Their simplest test — checking a repository’s primary language — consumed 44,026 tokens via MCP versus 1,365 via CLI. That is a 32× penalty for the same information. Across all 75 runs, the CLI approach was consistently 4 to 32 times cheaper, with MCP costing $55.20 per 10,000 monthly operations versus CLI’s $3.20. But the efficiency gap was only half the story: MCP completed its tasks with a 72% success rate compared to CLI’s perfect 100%. Nearly a third of MCP operations failed outright, typically from TCP-level connection timeouts during server initialization. The protocol designed to make tool use reliable was itself the primary source of unreliability.

The context window mathematics are even more damning. Apideck, an API integration platform with direct production experience, measured the token cost of connecting three typical enterprise services — GitHub, Slack, and Sentry — through MCP. The approximately 40 tool definitions consumed over 55,000 tokens before the agent processed its first user message. The equivalent CLI approach required an 80-token system prompt listing available commands, with the agent discovering details progressively via --help flags as needed. That is a 688× reduction in upfront context cost — not a rounding error, but an architectural chasm between two fundamentally different philosophies of how agents should discover their capabilities.

Here is the number that should haunt every engineering leader running MCP in production. Cross-referencing the Apideck measurement with ScaleKit’s cost data and Cloudflare’s 81% waste finding, a mid-size engineering organization running five MCP-connected services across 5,000 daily agent sessions burns approximately $300,000 per year in overhead tokens alone — context window capacity that purchases nothing but a menu the model could discover in real time for pennies. The CLI equivalent runs that same workload for under $500 annually. That $300,000 delta does not improve accuracy, does not accelerate responses, does not unlock new capabilities. It is a pure tax, paid in the currency that matters most to language models: attention.

The market has already internalized this verdict. The Pragmatic Engineer survey of roughly 1,000 senior engineers in February 2026 found that Claude Code — a CLI-native tool built on Unix composability rather than MCP plumbing — has overtaken both GitHub Copilot and Cursor to become the most-used and most-loved AI coding tool, with 46% of respondents naming it their favorite. As we detailed in our head-to-head comparison, the tool’s design philosophy treats shell commands as its primary interface, piping standard Unix utilities rather than routing through protocol servers. GitHub made the same bet when Copilot CLI reached general availability on February 25, 2026, with Plan mode, Autopilot mode, and dynamic agent delegation — all terminal-native, all bypassing MCP entirely. The two largest code platforms on earth are converging on the same architecture, and it is not a protocol.

The philosophical foundation explains why the data breaks this way. Vivek Haldar argued in a widely shared essay that Unix was a “love letter to agents” — small sharp tools connected by pipes, with plain text as the universal interface. An LLM is precisely the user Unix was designed for: it reads documentation fluently, reasons about it at machine speed, and composes small operations into sophisticated workflows without needing a schema to tell it what parameters exist. The composability grammar is baked into every frontier model’s weights through billions of training examples of shell pipelines, man pages, and --help output. MCP asks models to learn a new protocol. The terminal speaks their native language. This is why a CLI agent can improvise novel command chains on the fly while an MCP agent is limited to the exact tool definitions loaded at session start.

Perhaps the most telling evidence is what happens when engineers try to fix MCP’s overhead problem. The open-source mcp2cli project achieves a 96 to 99% token reduction by wrapping MCP servers in command-line interfaces — saving 362,000 tokens on schema definitions alone for a 120-tool setup over 25 conversation turns. Speakeasy’s Dynamic Toolsets take a similar approach, reporting 96% input token savings by lazy-loading tool schemas instead of dumping them upfront. Both solutions are clever engineering. Both are also confessions: when the fix for a protocol is to wrap it in a CLI, the CLI was the answer all along.

The bull case for a sinking protocol

Intellectual honesty demands acknowledging what MCP gets right — and the counterarguments are neither trivial nor entirely wrong.

The strongest defense comes from enterprise architects who see MCP not as a developer tool but as a governance layer. Charles Chen argued on ITNEXT that critics engage only with MCP’s tool-calling interface while ignoring its prompts and resources mechanisms — organizational primitives that let companies standardize how agents interact with internal systems, enforce consistent patterns across teams, and audit tool usage at fleet scale. In this framing, MCP is not competing with CLI for individual developer workflows. It is competing with anarchy for enterprise-wide agent governance. The ability to monitor which tools agents invoke, enforce compliance constraints, and manage identity across hundreds of AI assistants is a problem that bash genuinely cannot solve. A service token and a CLI binary give a single developer god-mode access; MCP gives an organization a control plane.

The ecosystem numbers lend this argument weight. MCP now logs 97 million monthly SDK downloads and has been adopted by every major AI platform — OpenAI, Google DeepMind, Microsoft, AWS, Block, and Bloomberg. Stainless, the company that builds SDK infrastructure for OpenAI and Anthropic, makes the network-effects argument plainly: MCP is not revolutionary technology, but it is “simple, well-timed, and well-executed” with an ecosystem moat that CLI-based approaches cannot replicate. Every API company now ships an MCP server alongside their REST documentation. That is infrastructure lock-in measured in thousands of integrations, not marketing hype.

Matthew Hall offers the maturity defense, arguing that critics are evaluating a protocol in its infancy and drawing conclusions about its ceiling. The local-process, manual-install experience that frustrates developers today is the early-adopter experience, not the destination. Streamable HTTP transport and OAuth 2.0 client credentials are replacing the original stdio-based architecture with hosted infrastructure maintained by service providers. In Hall’s vision of the near future, connecting an MCP tool is as simple as pointing a client at a URL and completing an OAuth flow — no local servers to install, no processes to manage, no startup latency to endure. The protocol’s growing pains are real, but growing pains are not death sentences.

But the maturity argument cuts both ways. Even MCP’s own 2026 roadmap concedes the production challenges: retry semantics remain undefined, expiry policies are absent, stateful connections force “sticky” routing that prevents effective horizontal auto-scaling, and running Streamable HTTP at scale has exposed middleware gaps that the specification never anticipated. The security picture is equally unsettled. Simon Willison, co-creator of Django and a prominent AI security researcher, documented fundamental vulnerabilities including “rug pull” attacks where tool definitions mutate their behavior after receiving user approval, and cross-server attacks where a malicious MCP instance manipulates other connected servers through prompt injection embedded in tool descriptions. The protocol’s authorization specification, as Solo.io VP Christian Posta detailed, couples the MCP server as both resource server and authorization server — a design that violates OAuth best practices and creates deep friction for enterprises whose SSO infrastructure cannot accommodate MCP’s assumptions. The promise of mature, enterprise-ready MCP remains precisely that: a promise, not a product.

The honest tension is this: MCP solves a real problem for a specific audience that CLI cannot reach. Non-developer knowledge workers who interact with AI through web interfaces and desktop applications — the people using ChatGPT and Claude through a browser — are never going to open a terminal window. For that audience, MCP’s graphical integration model remains the only viable path to connecting AI with their daily tools. The agent platform wars underway between Nvidia, OpenAI, and Anthropic still feature MCP as a consumer-facing integration layer, and rightly so. Consumer accessibility and enterprise governance are legitimate use cases where the protocol earns its keep. But MCP’s critical error was positioning itself as the universal answer for all AI tool integration — including the developer workflows where the terminal has reigned for half a century. Claiming that territory was overreach, and the correction has been swift, public, and backed by hard numbers that no amount of roadmap optimism can explain away.

Three moves before the window closes

The SOAP-to-REST parallel is instructive and almost eerily precise. In the early 2000s, enterprise software bet heavily on SOAP — a complex XML-based protocol with WSDLs, UDDI registries, and elaborate type systems that promised universal interoperability. It worked. Major companies built production systems on it. Standards bodies endorsed it. But when REST offered a simpler architectural style that leveraged what HTTP already provided, the industry did not transition overnight. SOAP persisted in enterprise niches for over a decade while REST captured every new greenfield project. MCP is tracing the same arc: too entrenched to vanish, too heavy to win the next wave of adoption. The protocol will survive in governance-sensitive enterprise contexts and consumer-facing integrations. But the developer terminal — like REST before it — will quietly absorb every new use case because it already has the infrastructure, the documentation, and now the audience.

The strategic play is not to rip out MCP today but to build CLI-first and treat MCP as an optional facade for the contexts that genuinely require it. Here is what that means in practice.

Audit your token overhead now. Connect your production MCP servers and measure the actual token cost before the first user message reaches the model. If tool definitions consume more than 15% of your context window, you are paying a compounding tax on every conversation that follows — a tax that grows linearly with each service you add. The ScaleKit benchmark methodology provides a reproducible framework for this measurement. Most teams who perform the exercise discover the overhead is three to five times worse than they assumed, because each additional service multiplies the schema payload linearly while context window capacity stays fixed.

Inventory your CLI equivalents. For every MCP server in your stack, check whether a mature, well-documented CLI already exists. GitHub has gh. AWS has aws. Kubernetes has kubectl. Docker has docker. Terraform has terraform. Slack has its own CLI. These tools handle authentication, pagination, error handling, and rate limiting out of the box — capabilities that MCP servers often reimplement imperfectly and incompletely. A service token plus a CLI binary gives an agent equivalent capabilities with a fraction of the token overhead and decades more battle-testing. The standardization of documentation through man pages and --help flags means the model already knows how to discover and use these tools without upfront schema loading. As we explored in our comparison of Claude Code and Codex CLI, the most effective agentic coding tools already treat shell commands as their primary interface, not protocol calls.

Adopt progressive disclosure over upfront schemas. The fundamental architectural insight driving this shift is that agents do not need to know every parameter of every tool before beginning work. They need to know which tools exist — a single line each — and can discover details on demand via --help, man pages, or documentation retrieval. This is how human developers work, and it is how language models work best. Cloudflare’s Code Mode demonstrated one implementation: converting tools into a TypeScript API that the model programs against, cutting token usage by 81% for complex operations. The mcp2cli pattern demonstrates another: wrapping existing MCP servers in command-line interfaces that achieve 96 to 99% token savings without sacrificing functionality. Both share the same principle — lazy loading beats eager loading when context is your scarcest resource.

The broader trajectory is unmistakable. Claude Code now accounts for 4% of all public GitHub commits — roughly 135,000 per day — a figure that was zero thirteen months ago. The Pragmatic Engineer data shows 75% of the smallest companies already favor CLI-native AI tools over GUI or protocol-based alternatives. GitHub’s decision to ship Copilot CLI as a full agentic environment — not just a suggestion engine — confirms that the industry’s two largest code platforms are converging on the terminal as the primary interface for AI-assisted development. The Gemini CLI’s evolution toward full interactive shell integration tells the same story from Google’s side: every major AI lab is building for the terminal, not for the protocol layer.

Anthropic’s own trajectory tells this story with unusual clarity. They created MCP, marketed it as the universal standard, watched it achieve genuine ecosystem adoption — and then built Claude Code, a tool that succeeds precisely because it treats the operating system as its integration layer instead of routing through protocol servers. The company that invented the protocol is winning with the product that does not need it. That is not an accident. It is an admission about where the real leverage lives.

The teams that act on these numbers now — auditing overhead, building CLI-first, treating MCP as a backward-compatible facade for consumer and governance use cases rather than the foundation of their agent architecture — will spend the rest of 2026 with agents that are faster, cheaper, and more reliable than their protocol-laden competitors. The teams that wait for MCP to mature will find themselves optimizing a tax they never needed to pay.

The terminal has been composable, documented, and authenticated since before most AI engineers were born. The only thing that changed is that the models finally got smart enough to use it. The CLI did not need a comeback. It needed an audience that could read man pages at the speed of light.