· ai / coding / cli

OpenAI Codex CLI vs Claude Code: which wins in 2026?

Claude Code wins on IDE coverage, MCP ecosystem, and git integration. Codex CLI wins on sandboxing and OpenAI-ecosystem fit. Pick based on your constraints.

By

2,242 words · 12 min read

Claude Code wins if you want the widest IDE coverage, the deepest MCP integration, and native git workflows that carry commits through to pull requests. Pick Codex CLI if you’re already in the OpenAI ecosystem, want a sandbox that’s genuinely off the network by default, or prefer a Rust-native binary with no Node dependency.

Who this is for

Terminal-native developers choosing a primary autonomous coding agent in June 2026. Codex CLI v0.138.0 shipped June 8 — one day before this article — making this the right moment to take a fresh look. Both tools compete for the same developer: someone who wants AI help that lives in the shell and acts autonomously. If you want inline autocomplete inside an IDE panel without leaving your editor, look at Cursor instead.

What we’re comparing

Codex CLIClaude Code
Versionv0.138.0 (2026-06-08)Latest (as of 2026-06-09)
PlatformmacOS Apple SiliconmacOS Apple Silicon
Implementation languageRust (96.1%)Node.js
Default backend modelGPT-5.5Claude Sonnet 4.6 (Pro) / Opus 4.x (Max)
DistributionNative binarynpm package

Quick verdict

CategoryWinnerNotes
SetupTieSingle binary (Codex) vs npm install (Claude)
SandboxingCodex CLI3 explicit modes; network requires approval in default mode
IDE integrationClaude CodeVS Code, JetBrains, Desktop, Web
Git & CI/CDClaude CodeNative PR creation, GitHub Actions, GitLab CI/CD
MCP supportClaude CodeRicher ecosystem (Jira, Slack, GitHub, Figma); both support STDIO + HTTP
Model benchmarksNo clear winnerOpenAI dropped SWE-bench; Claude’s historical scores are stale
Pricing clarityClaude CodeDollar-per-token vs opaque credits conversion
Community sizeCodex CLI89,000+ GitHub stars; Claude Code has a noisier complaint thread

Setup and UX

Codex CLI ships as a self-contained native binary for macOS Apple Silicon (codex-aarch64-apple-darwin.tar.gz). Unzip, move to PATH, set your API key.

curl -L https://github.com/openai/codex/releases/latest/download/codex-aarch64-apple-darwin.tar.gz | tar xz
mv codex /usr/local/bin/
export OPENAI_API_KEY=sk-...
codex --version
# 0.138.0

Claude Code installs via npm:

npm install -g @anthropic-ai/claude-code
export ANTHROPIC_API_KEY=sk-ant-...
claude --version

Neither is hard. The Rust binary is the cleaner install story — no package manager, no runtime version to manage. The downside is that Codex CLI updates are manual unless you script a pull from the release page. Claude Code inherits npm’s update story (npm install -g @anthropic-ai/claude-code), which is familiar but still a separate step from the OS package manager.

Both authenticate via API key. Both support environment-variable injection for CI and team environments.

Safety and sandboxing

This is Codex CLI’s clearest differentiator. Three explicit approval modes:

ModeAllowed
read-onlyFile reads; writes and shell commands require approval
workspace-write (default)Reads + writes to project directory; shell commands allowed (git, package managers, test runners); network requires approval (Codex asks before using)
danger-full-accessUnrestricted shell + internet access

In workspace-write mode, network access is approval-gated — Codex asks before using the internet rather than blocking it outright. danger-full-access removes all restrictions, including the approval prompts, for sessions where you need unrestricted external access.

Claude Code uses an approval-prompt model: before running any destructive command, it asks. The confirmation cadence is the safety mechanism — there’s no equivalent three-tier explicit mode declaration. Claude’s actual OS-level isolation mechanisms weren’t independently verifiable from primary sources at time of publication.

If you’re running agents in CI or overnight without supervision, Codex CLI’s explicit mode is auditable in a way that interactive approval isn’t. You can lock a pipeline to workspace-write in your job definition and know that every network request in that job requires an explicit approval step, regardless of what the model decides to do.

Model quality and benchmarks

Codex CLI v0.138.0 recommends GPT-5.5, released April 23, 2026 — OpenAI describes it as “a new class of intelligence for coding and professional work,” with a 1,050,000-token context window (developers.openai.com/api/docs/models/gpt-5.5). GPT-5.4-mini is available for faster, cheaper tasks where GPT-5.5 would be overkill.

Claude Code on Pro uses Sonnet 4.6. The Max plans default to Opus 4.8; Opus 4.7 is a legacy model.

The benchmark problem:

Here’s the honest answer: you can’t compare these tools on SWE-bench in June 2026. OpenAI has stopped publishing SWE-bench evaluations and has released no SWE-bench numbers for GPT-5.5. Community claims of “85% after 8 attempts” didn’t survive primary-source verification — treat those numbers as noise.

Claude’s most recent published SWE-bench number is Claude 3.5 Sonnet at 49% (January 2025). That’s 18 months old. More recent Claude 4.x scores exist on the Anthropic blog, but weren’t independently verified at the model + scaffolding level for this comparison.

What you can use: real workload testing. Codex CLI’s 89,000+ GitHub stars and active issue tracker give you a large signal on real-world failure modes. Claude Code’s r/ClaudeCode community does the same. The consistent community read: both tools are top-tier; the gap shows on task complexity rather than average task quality.

Multimodal: Codex CLI has a -i/--image flag for image input. The v0.138.0 release notes document image improvements — local image attachments and standalone image generations now expose their saved file paths to the model. Production reliability of the flag in complex workflows is unverified from primary sources; don’t depend on multimodal Codex input for production flows until you’ve validated it against your own workload. Claude Code handles images reliably through the standard prompt interface.

Context window

Context window sizes weren’t verifiable from primary sources at the time of writing for either tool. The practical constraint you’ll hit sooner is rate limits, not raw window size. Codex CLI enforces 5-hour rolling rate windows. Claude Code’s subscription plans cap usage within session windows.

For large-codebase tasks on both tools: the reliable pattern is explicit file inclusion rather than trusting the agent to discover what it needs. Both tools can be instructed to read specific files; both can exhaust context on sprawling monorepos without scoping.

Pricing

This is the most opaque part of the comparison.

Claude Code charges in dollars per token (API usage beyond plan limits):

ModelInputOutput
Opus 4.8$5 / MTok$25 / MTok
Sonnet 4.5 / 4.6$3 / MTok$15 / MTok

Subscription plans: Pro at $20/month (Sonnet 4.6 within limits), Max at $100–200/month for higher throughput and Opus access. Anthropic reports average enterprise usage at ~$13/dev/active day; 90% of users stay under $30/active day on heavy use days.

Codex CLI charges in credits per token:

ModelInputOutput
GPT-5.5125 credits / MTok750 credits / MTok
GPT-5.4-mini18.75 credits / MTok113 credits / MTok

The credit-to-dollar conversion rate was not confirmed from a primary source at time of writing. Before budgeting any significant Codex CLI usage, verify the current rate at developers.openai.com/codex/pricing. Rate limits apply in 5-hour rolling windows — plan for burst headroom if you’re running autonomous sessions during the workday.

Practical implication: Claude Code’s dollar-per-token pricing is straightforward to model. Codex credits add a layer of indirection. If budget predictability matters to you or your finance team, that’s a real operational difference.

IDE and ecosystem integration

Claude Code’s platform coverage is substantially broader:

  • VS Code extension (inline diffs, @-mention navigation, agent panel)
  • JetBrains plugin
  • Claude Desktop app
  • Web (claude.ai/code)

Codex CLI is terminal-only. There’s no IDE plugin. If your workflow involves writing code in the editor and reviewing AI diffs in an inline panel, Claude Code wins outright. Codex CLI’s terminal-first design is intentional — it assumes the shell is where you live — but it’s a hard constraint if you cross the terminal/editor boundary regularly.

Git and CI/CD

Claude Code is native-git: it stages changes, writes commit messages, creates branches, and opens pull requests. GitHub Actions and GitLab CI/CD integrations let you run Claude Code as an autonomous code reviewer or fixer in your pipeline. GitHub Code Review integration is built in. See how to wire Claude Code into your CI pipeline for setup patterns.

Codex CLI operates within its workspace-write sandbox. It can read and modify project files, but the git automation layer — branching, committing, PR creation, CI/CD hooks — requires separate tooling. Codex can generate diffs; getting those diffs into a pull request is your problem.

If your delivery loop is: task → commit → PR → CI pass → merge, Claude Code covers it without additional scaffolding. Codex CLI covers the “task → diff” part.

MCP ecosystem

Claude Code supports MCP via both STDIO and HTTP endpoints. Production-ready integrations include Jira, Slack, GitHub, Figma, and custom tools — a mature ecosystem built up over the past year. See how to set up MCP with Claude Code for the configuration steps.

Codex CLI also supports both STDIO and streaming HTTP servers, configured at ~/.codex/config.toml. The transport parity is real; the practical gap is ecosystem maturity. Claude Code’s MCP library is substantially larger and better-documented at time of writing.

Community signals

Codex CLI: 89,000+ GitHub stars as of June 2026, with an active release cadence (v0.138.0 was one of many releases in recent months). The consistent praise in community threads: the autonomous sandbox model doesn’t require constant permission confirmations the way Claude Code does. Developers running long-duration agentic tasks specifically call this out as the reason they use Codex.

Claude Code: 16.8 million installs and a 4/5-star rating (703 reviews) on the VS Code Marketplace as of June 2026; 4,200+ subscribers on r/ClaudeCode. Community complaints peaked in March 2026 around performance degradation and excessive permission prompts. Anthropic has shipped incremental improvements since, but the sentiment — that Claude Code interrupts more often than it should — persists in the subreddit.

The emerging pattern from experienced developers using both: Codex CLI for fire-and-forget autonomous tasks, Claude Code for interactive feature work and anything involving the full git loop.

Who should pick which

Pick Codex CLI if:

  • You’re already building on OpenAI’s API and want one vendor
  • You run unsupervised autonomous tasks and need explicit, auditable network isolation
  • You prefer a native binary without a Node runtime dependency
  • You do primarily terminal and shell work with no IDE integration requirement

Pick Claude Code if:

  • You want IDE integration (VS Code, JetBrains, web)
  • Your workflow depends on automated commits, PRs, and CI/CD triggers
  • You want a larger, more mature MCP ecosystem (Jira, Slack, GitHub, Figma, and custom tools)
  • You want dollar-per-token pricing that’s straightforward to model for a team budget

Run both if:

  • You’re a developer who wants the sandboxed, fire-and-forget model for long agentic tasks (Codex) and the full git-native review and PR loop (Claude Code) for shipped work
  • You’re on a mixed-infrastructure team that uses both OpenAI and Anthropic APIs

Verdict

Codex CLI v0.138.0 has the best sandboxing story in the autonomous coding agent space right now. Network isolation on by default, three explicit mode tiers, Rust-native binary — if you’re wiring up an unattended pipeline and need to know exactly what the agent can and can’t touch, Codex is the right default. The gaps are real: no IDE integration, a thinner MCP ecosystem than Claude Code’s, and credit-based pricing that requires a trip to the docs before you can reason about cost.

Claude Code covers more surface area. IDE everywhere, native git automation, a mature MCP ecosystem. The daily development experience is more integrated for developers whose work crosses the terminal–editor boundary. The sandboxing is less auditable by design — it asks before acting rather than declaring modes upfront — which is fine for interactive sessions and a genuine concern for unsupervised ones.

Neither tool is retiring the other. Codex CLI for background autonomous work. Claude Code for interactive feature development, code review, and CI integration. That combination — not a binary pick — is what experienced practitioners are running in mid-2026.

For individual deep dives, see the standalone OpenAI Codex CLI review and the Claude Code review.

Caveats

  • No independently verified SWE-bench scores exist for GPT-5.5 or current Claude Code models at time of publication. OpenAI has stopped publishing SWE-bench evaluations.
  • Context window sizes weren’t confirmed from primary sources for either tool.
  • Codex CLI credit-to-dollar conversion rate wasn’t independently verified — check the official pricing page before budgeting.
  • Claude Code sandboxing implementation details (OS-level isolation mechanisms) weren’t confirmed from primary sources.
  • Codex CLI v0.138.0 was released June 8, 2026. Both products update frequently; check release notes before acting on version-specific claims.
  • The Cursor link above is an affiliate link — toolchew receives a commission on signups via /go/cursor.

References