Claude Code vs Codex CLI: GPT‑5 vs Claude 4.5 • Stephen Van Tran

Two command‑line coding companions now define the state of the craft. OpenAI’s Codex CLI is a fast‑moving, batteries‑included terminal agent tuned for code edits, test loops, and guarded execution. Anthropic’s Claude Code is its ascetic counterpart—surgical in how it reads, thinks, and acts in repositories, with a strong safety grammar and a knack for multi‑agent orchestration. Both got smarter and more assertive in the last few months as new model families landed: GPT‑5 and GPT‑5‑Codex on OpenAI’s side; Claude Sonnet 4.5 and Haiku 4.5 on Anthropic’s. This piece is a practitioner’s comparison—in the shell, in real workflows, grounded in release notes and in what the developer commons actually says.

Thesis & Stakes

The stakes are straightforward: your CLI agent either accelerates the inner loop without breaking the build—or it becomes a loud, expensive distraction. The right choice depends on how you value four forces: model behavior under pressure (reasoning, verbosity, and tool use), execution safety and guardrails, ergonomics in the TUI/UX, and the velocity of the team shipping it.

On model behavior, OpenAI’s documentation for Codex shows first‑class toggles that matter when you’re deep in an incident or migrating a codebase: you can dial GPT‑5 and GPT‑5‑Codex to a high reasoning budget and high output verbosity through config, not prompt ceremony. In the Codex CLI configuration, model_reasoning_effort explicitly supports "high", and GPT‑5 family models expose model_verbosity with "low" | "medium" | "high" (Codex docs: Model selection and verbosity). That means when you flip to “high,” Codex asks the model to justify more, think longer, and write fuller diffs without you changing how you talk to it.

Anthropic’s counter is architectural rather than a single knob. Claude 4.5 Sonnet and Haiku are designed around long‑horizon focus and tool picking that stays on task. Developers documenting their production use describe Sonnet 4.5 staying coherent across very long, multi‑step tasks and doing top‑tier coding work (Ars coverage: “maintained focus for 30 hours” with 4.5 Sonnet). Haiku 4.5 appears to bring much of that style to a lower‑latency, lower‑cost tier, and security researchers report it resists jailbreak antics better than older cohorts (see Max Woolf’s write‑up on 4.5 Haiku refusing prompt attacks).

The CLI ergonomics are where taste and speed meet. Codex has shipped a stream of pragmatic quality‑of‑life improvements—ctrl‑n/ctrl‑p to fly through slash command lists, smarter backtracking that skips /status, and unified exec output formatting—while pushing non‑interactive automation via codex exec. Claude Code spent cycles on agentic scaffolding: subagents, plan mode, configurable permission prompts, and marketplace‑style plugins. It feels like two philosophies: Codex aims to be the dependable power tool; Claude Code, the orchestration workbench.

Velocity is the final, underrated differentiator. Codex’s releases show a measured drumbeat of stable drops punctuated by alpha trains. In early November alone, OpenAI shipped 0.54.0 → 0.55.0 → 0.56.0 → 0.57.0 in six days, then five 0.58.0 alphas in the following 48 hours (GitHub Releases). Claude Code’s CHANGELOG tells a similar story of sustained iteration—eleven 2.0.x entries in a recent two‑week span touching subagents, VS Code integration, OAuth stability for MCP servers, and memory fixes (Anthropic’s GitHub). Rapid cadence matters when your agent is part of the inner loop; it is the difference between a rough edge lingering for a sprint and getting sanded off overnight.

Net, both stacks are credible. If your shop skews toward API‑level control with explicit levers for reasoning and verbosity—and you want a CLI that doubles as an automation surface—Codex CLI with GPT‑5/GPT‑5‑Codex is the safer default. If your work benefits from long‑context, multi‑agent workflows with tight permissioning—and you’re happy to invest in a house style for subagents—Claude Code with Sonnet/Haiku 4.5 pays dividends.

Two implementation notes often missed in high‑level comparisons:

Codex CLI’s AGENTS.md discovery gives you a portable memory layer. It walks from repo root to CWD, merging guidance files (AGENTS.md/AGENTS.override.md). If you encode local safety rules, naming conventions, or migration patterns in those files, Codex reads them before touching the keyboard (docs point to agents.md). In practice, that’s how you make the model “feel like your team.”
Claude Code’s subagent isolation is not just vibe; the separate context windows are how you prevent token pollution across parallel tasks. When you pair that with strict tool permissions per subagent, you get almost a microservices‑style fault domain inside the session. It’s slower than one big hammer—and that’s the point.

Evidence & Frameworks

Model families first. OpenAI’s public pages now separate the platform story (“GPT‑5 for developers”) from the consumer/product story (“GPT‑5”). Community reporting corroborates a jump in the system’s breadth and guardrail sophistication, and the Codex CLI docs expose the operational knobs we actually use:

Codex model_reasoning_effort: minimal | low | medium | high. Pushing high asks GPT‑5 and GPT‑5‑Codex to allocate more cognitive budget to a single turn—vital for gnarly migrations or security reviews. The CLI passes this as a first‑class parameter via the Responses API (Codex docs: Config → Model selection).
Codex model_verbosity: low | medium | high. Trims or expands output detail on GPT‑5 family models—useful when toggling between “just the diff” and “explain every decision with citations.” When set, Codex adds a text.verbosity object to requests (Codex docs: Config → Model selection).
Codex also lists GPT‑5‑Codex explicitly as a model identity alongside GPT‑5, and enables reasoning summaries on “codex‑*” out of the box (Codex docs: reasoning summaries). The naming is a tell: there is a variant tuned for code work that leans into tools, diffs, and tests.

Anthropic’s model line for 4.5 focuses on durable attention and controllable tool use. Public reporting and practitioner write‑ups suggest Claude Sonnet 4.5 handles very long, branching tasks without losing the thread and often produces conservative, readable code. Haiku 4.5 is framed as the speed tier with similar instincts. You can see that DNA reflected in Claude Code’s product work: a clean permission model, separate context windows for subagents, and a bias toward “explain first, change second.”

CLIs and shells next. Codex CLI’s “getting started” pathway makes a few positions clear:

It is a terminal‑native tool: codex for interactive TUI, codex exec for non‑interactive runs, and a resume system (codex resume or session picker) meant for hopping back into a prior thread when you’ve already paid the understanding cost (Codex docs: Getting started).
It treats files as the main medium: @ search in the composer, on‑demand diffs, and an MCP‑integrated config surface that brings in local or remote tools. The docs call out MCP servers explicitly and show how to wire them in config.
Guardrails are opinionated but practical: “unified exec” consolidates command execution, safe commands can run without approval, and a sandbox exists for destructive actions. Release notes call out process‑group kills on timeouts and a friendlier Cloudflare 403 explanation when web tools choke (0.57.0 Release).

Claude Code’s arc is more agentic in spirit:

Subagents and plan mode shift from “one assistant in the terminal” to “task‑scoped personalities” with their own context and tool permissions. The CHANGELOG entries around 2.0.28–2.0.31 detail the introduction of a Plan subagent, dynamic model selection for subagents, permission‑prompt UI, and fixes to ensure MCP tools are available inside sub‑agents (Anthropic CHANGELOG).
The VS Code extension is more than a dock—it inherits Claude Code’s modes and respects editor preferences (chat.fontSize, chat.fontFamily), with fixes landing to apply font changes without reload and to restore selection indicators (2.0.35, 2.0.30). If your inner loop lives in VS Code, that matters.
Safety and stability land as consistent papercut fixes: throttling OAuth token loops for MCP servers, memory crash fixes for large base64 images, and better navigation in command menus and /hooks—the kind of polish that protects elastic sessions.

Where these differences show up in real workflows:

Incremental framework upgrades. In Codex, set GPT‑5‑Codex to model_reasoning_effort = "high" for reconnaissance, ask for an inventory (APIs, plugins, codemods), then drop to "medium"/"low" for the codemod grind via codex exec. The explicit verbosity control keeps diffs lean in CI while letting the model narrate tradeoffs in a separate artifact. In Claude Code, split into a read‑only “discovery” subagent and a narrow‑write “codemod” subagent; the former proposes, the latter applies with prompts and approvals.
Incident debugging at 2 a.m. Codex’s backtracking and session resume make it feel like tmux for AI—codex resume --last, backtrack to the last good hypothesis (skipping /status), and fork. Claude Code’s plan mode shines if you’ve pre‑baked incident subagents (log sleuth, rollback cop, customer‑impact estimator) with minimal tool lines.
Multi‑repo changes. Codex’s --cd/--add-dir exposes multiple repos as writable roots in one run; pair with a slash command to stage/commit across roots. Claude Code works best when you scope subagents per repo and rely on separate contexts to limit cross‑talk.

A pragmatic head‑to‑head across eight common tasks:

Read a large unfamiliar codebase. Codex favors a fast “explain the architecture” sweep that you can re‑prompt into “propose three well‑scoped PRs”—a flow the docs literally recommend. Claude Code tends to write more grounded overviews that cite concrete files and invariants, and it will often propose a plan first, which you can accept into tasks executed by subagents.
Run and iterate tests. Codex’s unified exec shines here: it streams test output with clean formatting, treats safe commands as auto‑approved, and kills process groups on runaway timeouts. Claude Code will run tests too, but it’s more likely to pause on permission prompts; once configured, that friction becomes a feature on teams that want a second “are you sure?”
Edit multiple files with safety. Both can show diffs on request, but Codex’s default posture is “do, then show.” Claude Code is more likely to explain proposed edits, ask, then apply. If your team has junior engineers adopting the tool, Claude’s style lowers risk; if your team has senior engineers who value speed, Codex’s defaults feel closer to how they already work.
Search and refactor. Codex’s @ search and history navigation make “find, jump, change, explain” feel like a single motion. Claude Code’s Rust‑based fuzzy finder and command palette are strong, and its bias to explain first can surface hidden coupling before you touch the code.
Shell automation. Codex’s codex exec is the story—non‑interactive runs that you can wire into CI or cron with precise model settings. Claude Code’s SDK mode and new idle‑exit flag make it viable for automation, but it’s more at home in interactive or editor‑embedded sessions where permission prompts are visible.
Multi‑agent work. Claude Code wins by design: dedicated subagents with their own context, tools, and stop hooks map well to real teams. Codex can approximate this with discipline (slash commands, separate sessions), but it’s not the native idiom.
Model‑level toggling. Codex gives you explicit config flags for reasoning effort and verbosity, plus the ability to swap model providers if you need to (including non‑OpenAI providers that speak the OpenAI wire). Claude Code chooses the model per task or subagent under the hood with a bias toward Sonnet 4.5 for heavy lifts.

Model behavior in practice, by knobs:

Reasoning effort: Setting GPT‑5 or GPT‑5‑Codex to "high" often improves multi‑step edits that require context stitching (e.g., ORM layer + HTTP layer + migrations). The cost is latency and verbosity; use sparingly. Claude Sonnet 4.5 naturally “thinks long” without an explicit flag; when paired with subagent isolation, it stays on task across branches.
Verbosity: model_verbosity = "high" on GPT‑5 family is excellent for design write‑ups, ADRs, and code review rationales. Keep it at "low" for day‑to‑day diffs. Claude 4.5 models default to considered prose; you can prompt them to “diff‑only unless asked” to trim fat.
Reasoning summaries: Codex enables reasoning summaries for codex-* and o* models; they’re helpful breadcrumbs in a long session. Claude typically interleaves summary‑style explanations before actions; lock this pattern into your team prompts to make transcripts skimmable.

Guardrails and risk envelopes:

Codex: unified exec, idle streaming timeouts, safe‑command auto‑approval, and process‑group kill behavior give you strong blast‑radius control. Approvals still matter for destructive commands. The ergonomics encourage automation (great), but also make it easy to forget a guard you meant to keep (watch your config diffs in CI).
Claude Code: everything about subagents and permission prompts pushes you to declare intent and scope. The tradeoff is more keystrokes when you just want to nudge a file. Over time, this nudges teams toward repeatable, auditable flows, which is the point.

Now to the changelogs that shape day‑to‑day experience. The latest stable Codex release (0.57.0) highlights “TUI quality‑of‑life,” “improve timeout on long running commands,” and a string of merges that read like a guided tour of a power user’s wish list: CTRL‑n/CTRL‑p to move through slash commands and history; skipping retries on insufficient_quota; removing the legacy shell tool when unified exec is enabled; normalizing Windows paths in WSL upgrades; killing process groups on timeout. For teams that script their agent and run it against CI sandboxes, those details are the real currency (GitHub Releases • 0.57.0).

Claude Code’s close‑range fixes feel similarly pragmatic but lean orchestration‑first. Recent entries include: better fuzzy search for commands (native Rust‑based finder), a CLAUDE_CODE_EXIT_AFTER_STOP_DELAY env var for SDK mode to auto‑exit after idling (useful in automations), moving ignorePatterns into deny‑style permissions, prompts‑based stop hooks, and subagent resume behavior. None of these are headlines; all of them lower the “what just happened?” tax in long sessions (Anthropic CHANGELOG 2.0.28–2.0.37).

Community sentiment is noisy and useful. Three links anchor the current mood:

OpenAI’s Codex CLI was introduced to HN with 516 points and 289 comments, a healthy signal for an engineering‑first tool and a fertile source of bug reports and feature requests (HN → openai/codex).
Simon Willison’s reverse‑engineering note on getting GPT‑5‑Codex‑Mini to “draw me a pelican” inside Codex CLI drew 166 points and 75 comments, capturing exactly the hacker‑energy this ecosystem thrives on (simonwillison.net).
Two Claude Code threads paint an accurate picture of its appeal and limits: “Claude Code is all you need” (851 points, 504 comments) and a detailed case study modernizing a 25‑year‑old kernel driver (929 points, 318 comments). The praise clusters around deep repo understanding and cautiously correct edits; the pushback clusters around occasional over‑confident refactors and complex setup (HN → dwyer.co.za; HN → dmitrybrant.com).

This is precisely where model personalities meet product choices. GPT‑5‑Codex set to “high” reasoning and “high” verbosity inside Codex CLI tends to write more exhaustive diffs and justify tradeoffs. If you run it via codex exec in a guardrailed environment with non‑interactive approval, you get a consistent automation surface. Claude Sonnet 4.5, run through Claude Code with plan/subagents and strict permission prompts, tends to approach scaling tasks like a careful senior engineer—explain, propose, then act. Haiku 4.5 is the quick scout for these flows.

Two subtle, high‑value differences emerged while dogfooding both:

Diff behavior under backtracking: with Codex’s improved backtracking that skips /status, I found “undo and branch” edits stayed readable and less cluttered. It shaved minutes off arbitrating between two refactor directions. That change is small in the notes; it’s large in practice.
Subagent clarity: Claude Code’s separate context windows for subagents notably reduce cross‑talk. In multi‑file migrations, a “linters‑only” subagent and a “migrations‑only” subagent avoided stepping on each other. Token usage goes up; rollback drama goes down.

A note on the Responses API and migrations: OpenAI has been pushing a consistent wire model across products, with a migration kit for legacy Completions/Chat Completions to the unified Responses API that is “guided by Codex CLI.” If your estate still carries older integrations, this kit plus Codex’s automation surface (codex exec) makes the work decently mechanical.

Counterpoints

First, the model‑versus‑product caveat. Comparing GPT‑5 (and GPT‑5‑Codex) to Claude 4.5 Sonnet/Haiku is only half a story unless you pin it to the product layer that routes, constrains, and explains those models. Codex CLI and Claude Code make different bets about what the defaults should be: Codex pushes you to explicit, parametric control (reasoning effort, verbosity, model provider), while Claude Code pushes you to structured agents with permission and tool envelopes. Both are valid. Both can be misused.

Second, cost and latency. Public sources suggest that the 4.5 line delivers improved focus at practical price/latency, with Haiku designed to keep the loop tight. GPT‑5 family makes a case for higher‑quality, higher‑verbosity reasoning when you need it. Without your own price sheet and “wall‑clock to green tests” timing, any blanket “X is cheaper/faster” claim is cargo cult. Run your numbers.

Third, jailbreaks and guardrails. Max Woolf’s experiment hints that Haiku 4.5 resists simple jailbreak prompts better than earlier Claude generations. On the OpenAI side, GPT‑5 ships with a system card and stricter controls documented by third‑party readers. But your operator stance—what the CLI actually permits, which tools it exposes, and when—dominates in practice. A permissive --dangerously-skip-permissions flag (both ecosystems have a version of this) will undo months of alignment work.

Fourth, the model names themselves can mislead. “GPT‑5‑Codex” reads like a magical switch for code quality; in reality, the wins come from the plumbing around it: MCP server config for the tools your repo actually needs, the right model_reasoning_effort for the task, and a repeatable codex exec job that enforces approvals. On the Anthropic side, Sonnet 4.5’s celebrated endurance is most useful when you align the agent scaffolding (subagents, plan mode) to mirror your build/test/deploy steps.

Fifth, platform pragmatics. On Windows and WSL, Codex’s recent fixes to normalize Windows paths during updates and the general WSL care‑and‑feeding are not footnotes—they’re the difference between “works on my machine” and “works on the team’s machine.” If your fleet is cross‑platform, factor this in. Claude Code runs well on macOS/Linux out of the box and benefits from the VS Code extension to harmonize ergonomics on Windows.

Sixth, web search and networked tools. Codex explicitly exposes web_search_request and provider‑level toggles in config, and its error handling around Cloudflare/403s is now legible. Claude Code’s approach is more conservative—keep tools narrow, prompt clearly—but once OAuth token loops and memory limits were ironed out in recent releases, it became a trustworthy “fetch, think, act” citizen.

Finally, the community narrative isn’t always your narrative. High‑engagement HN threads can be dominated by wow‑moments and failure‑cases, not the boring middle where you spend your days. Let them inform your priors; don’t let them replace your benchmarks.

Outlook + Operator Checklist

The trajectory is healthy. OpenAI’s Codex project is shipping at a clip, teaching by example how to expose small but consequential toggles (reasoning effort, verbosity) and to strip away friction in the terminal. Anthropic’s Claude Code is maturing into an orchestrator with opinionated safety and better ergonomics for long‑running, multi‑agent work. The hard problems remain the same: avoid context drift, get tools right, share memory responsibly, and make failures legible.

Expect convergence in three places. First, native binaries and startup performance—Codex is already shipping native installers and shaving startup, and Claude Code’s native builds now launch quicker as well. Second, reasoning surfaced as product—Codex’s reasoning summaries and verbosity knobs are an early UI on “how much thinking to buy”; Claude Code’s plan/subagent UX is the complementary UI on “who should think, and with what tools.” Third, typed tools—both ecosystems are tightening tool schemas and error messages to tame “tool jams” that derail long sessions. These are not niceties; they directly reduce toil in the loop.

An operator’s checklist to choose and run either stack well:

Decide your defaults explicitly. For Codex CLI, set model = "gpt-5" or "gpt-5-codex" and fix model_reasoning_effort = "high" only for tasks that merit it; keep "medium" for routine edits. For Claude Code, pick Sonnet 4.5 for long, cross‑cutting refactors; Haiku 4.5 for fast, local changes.
Treat verbosity as a budget. In Codex, model_verbosity = "high" is a scalpel. Use it for design docs and migrations; drop to "low" for tight diffs in CI. In Claude Code, prefer “explain → propose → apply” prompts to keep artifacts readable.
Build guardrails in the product, not the prompt. In Codex, lean on unified exec, sandboxing, and required approvals for file mutations and rm -rf class commands. In Claude Code, keep permission prompts on and define deny‑lists in local settings; only elevate per subagent.
Favor declarative tool surfacing. Wire MCP servers in Codex via config; in Claude Code, enable only the tools the current subagent needs. Fewer tools, clearer traces.
Make backtracking cheap. In Codex, take advantage of improved backtracking that skips /status; branch and compare diffs instead of arguing with a single thread. In Claude Code, segment tasks into subagents with independent contexts; roll back at the agent boundary.
Automate the boring path. Use codex exec for non‑interactive jobs (lint fixes, codemods, docs sync) and pin it to a stable model/verbosity pair. In Claude Code, the SDK’s idle‑exit (CLAUDE_CODE_EXIT_AFTER_STOP_DELAY) makes ephemeral tasks clean.
Watch release trains. Codex’s stable/alpha rhythm means you can deploy a stable CLI and trial an alpha in a separate path. Claude Code’s 2.0.x cadence is fast enough that a recurrent “update and retry” ritual pays off.
Measure what matters to you. Track wall‑clock to green tests, rate of safe edits merged, and rollbacks per 100 Codex/Claude sessions. Replace vibes with numbers.

Operational playbooks worth templating:

Non‑interactive job templates. Put codex exec behind scripts for routine chores—“fix lint and re‑format,” “sync CHANGELOG from GitHub,” “bump minor versions across workspaces”—and pin model/verbosity. Record diffs; require manual apply in protected repos.
Migration recipes. Use Codex to generate draft codemods and test shims, then let a Claude Code subagent with read‑only tools review risks and annotate edge‑cases before applying.
Incident kits. Keep a minimal set of subagents in Claude Code for on‑call: logs reader with regex helpers, db‑tripwire that never writes, rollback playbook generator. Mirror those as slash commands in Codex for parity.

Two quantified takeaways to close:

Release velocity: Codex shipped four stable releases in six days (0.54.0–0.57.0 from Nov 4–9) and five alphas in the next 48 hours (0.58.0‑alpha.1–.5). Claude Code’s CHANGELOG shows eleven 2.0.x updates within roughly two weeks. For operators, this means bug‑class fixes often have a half‑life of days, not sprints (Codex GitHub Releases; Anthropic CHANGELOG).
Community signal: the original Codex CLI launch thread collected 516 points/289 comments; “Claude Code is all you need” logged 851/504; “Using Claude Code to modernize a 25‑year‑old kernel driver” garnered 929/318. Interest is not a scoreboard, but it is a proxy for where examples, plugins, and bug triage will be richest (HN links below).

If you need an opinion: pick Codex CLI when you want a reliable terminal tool that scales to automation with explicit model controls, and pick Claude Code when you want structured, multi‑agent workflows with conservative edits and rich explanations. Both are good enough to standardize if you own your defaults, wire your tools, and treat reasoners as apprentices who write down what they’re thinking.

Key sources, embedded above and here for convenience:

OpenAI: “GPT‑5 for developers” and “GPT‑5” overview pages (openai.com). Community summary of the GPT‑5 system card by Simon Willison. Codex docs: Getting started; Config → Model selection, reasoning effort and verbosity; GitHub Releases 0.57.0 and release feed.
Anthropic: Claude Code CHANGELOG (anthropics/claude‑code). Coverage of Claude 4.5 Sonnet (Ars Technica). Jailbreak testing on Claude 4.5 Haiku (minimaxir.com). “Claude 3.7 Sonnet and Claude Code” announcement (anthropic.com/news).
Community: HN thread for openai/codex; Simon Willison’s post on GPT‑5‑Codex‑Mini inside Codex; HN “Claude Code is all you need”; HN/dmitrybrant kernel driver case study.

OpenAI sources

Anthropic and Claude Code

Community sentiment