· ai-tools / claude-code / review

Claude Code in 2026: Honest Review After Six Months

Claude Code leads on model accuracy at $20/mo, but usage limits bite and the April 2026 regression is a trust story to read before committing to Max.

By Ethan

2,205 words · 12 min read

Claude Code has the best model accuracy of any agentic coding tool at the $20 price point. If you do multi-file, terminal-driven work and your workflow doesn’t depend on inline autocomplete, it’s the right default. The Pro plan usage limits are a genuine constraint, and the March–April 2026 regression — three separate engineering bugs that Anthropic initially denied before publishing a full postmortem — is a trust story you should read before committing to Max.

Who this is for

Mid-to-senior engineers who are comfortable in a terminal and considering a switch from Cursor, Copilot, or Codex CLI. If you want inline autocomplete in VS Code, stop here — Claude Code does not have that.

What Claude Code actually does

Claude Code is a terminal-native agentic coding tool. You launch it as a REPL (claude) or invoke it non-interactively (claude "do this task"). It reads files, writes files, runs shell commands, reads stdout/stderr, and iterates — no semantic indexer layer, no separate code database. It reads what it needs when it needs it.

The core value is the agentic loop: plan → execute → test → iterate. Give it a feature spec and it will draft code, run tests, read the failure output, fix the code, and continue until tests pass or it runs out of context and asks you. Compared to chat-style tools where you paste file contents and ask questions, Claude Code operates on your actual files and runs your actual test suite.

On top of the loop, two systems extend what you can connect. MCP (Model Context Protocol) lets you attach tool servers — databases, Slack, GitHub, browser automation. Thousands of community MCP servers exist on GitHub as of early 2026; Gartner projects 75% of API gateway vendors will include MCP support by end of 2026. SKILL.md files (called plugins in the v2.1.136+ UI) let teams ship custom slash commands. A /review command that runs your project’s specific checklist, a /debug command that knows your logging format — those live in markdown, not code, and workspace admins can push them to every developer in a single settings update.

Claude Code runs locally or in the cloud. Remote Control (Feb 2026) means you can hand off a long-running task to a cloud session and get a push notification when it’s done. The model defaults to Sonnet 4.6 on Pro. Max subscribers get Opus 4.7 with a 1M token context window (beta) and an Auto mode that switches between Sonnet and Opus depending on task complexity.

What’s improved since launch

Claude Code launched at GA in May 2025 on Claude 3.7 Sonnet. In the twelve months since, the model climbed through Sonnet 4, Sonnet 4.5, and Sonnet 4.6. Opus 4.7 with xhigh effort is now the Max ceiling. The capability additions worth knowing:

Background agents (Dec 2025). Hand off a long task to a cloud session with & and keep working. Claude sends a push notification when it finishes. For tasks that run 15–30 minutes — large refactors, full test suites, database migrations — this changes the economics of unattended work.

Plan mode (Jan 2026, v2.1.119). Before Claude touches code, it presents a plan. You review and adjust, then it executes. Reduces the “Claude went off and did something wrong for 20 minutes” problem on complex tasks.

/ultrareview (Feb 2026, v2.1.111). Dispatches multiple agents to review a diff in parallel — each agent independently analyzes a different aspect of the code. Useful on large PRs where you want separate reads on security, performance, and correctness without sequential bias.

Tool Search Tool (March 2026, v2.1.76). Claude discovers MCP tools on demand instead of loading all definitions upfront. The changelog reports that when MCP tool descriptions exceed 10% of the context window, they are deferred and discovered on demand — significantly reducing context usage for teams with large MCP setups.

Extended thinking (v2.1.116+). Multi-step reasoning with visible progress spinners — “still thinking / thinking more / almost done thinking.” Toggle with Alt+T. Combined with Opus 4.7’s xhigh effort level, this is the configuration for problems that genuinely require deep reasoning.

/from-pr (Jan 2026). Load full PR context from GitHub, GitLab, Bitbucket, or GitHub Enterprise directly into a session. Useful for picking up someone else’s half-finished work without manually pasting diffs.

Plugin marketplace (May 2026, v2.1.136). Team-installable plugins with managed settings enforcement. Workspace admins push skills and MCP configs to every developer without individual setup. A significant change for teams that have built internal tooling on top of Claude Code.

Windows support (v2.1.111+). PowerShell tool support rolled out progressively from April 2026 after years of Unix-first development. Still maturing, but functional.

What’s still rough

Usage limits on Pro

At $20/mo, the Pro plan gives approximately 44,000 tokens per 5-hour window — roughly 10 to 40 prompts depending on task complexity. A multi-file refactor session with long context can burn through that in an afternoon.

In August 2025, Anthropic added weekly caps on top of the per-session limits. Developers on the $200/mo Max 20× plan reported hitting the weekly ceiling mid-week. The backlash was immediate: subscription cancellations, public threads, and sustained pressure for months. Anthropic reset all subscriber limits in April 2026 alongside the regression postmortem, but the memory of hitting a weekly cap while paying $200/mo does not fade fast.

The honest math for heavy users: Pro is too tight if you run multiple real sessions per day. Max 5× at $100/mo is the practical entry point for daily agentic work.

The March–April 2026 regression

This is the most important section of this review. Between March 4 and April 20, 2026, Claude Code’s perceived quality dropped noticeably. The cause was three separate engineering bugs, not a model change:

Bug 1 — Reasoning effort downgrade (March 4 – April 7). Default reasoning switched from high to medium to reduce a UI freeze caused by extended thinking. Users reported Claude felt “less intelligent.” Anthropic’s initial response was to suggest user behavior had changed. The bug ran for five weeks before being reverted.

Bug 2 — Caching bug (March 26 – April 10). A session cache clear was designed to run once after an hour of idle. A bug caused it to run on every turn for the rest of the session. Claude appeared forgetful and repetitive — and each extra cache miss counted against usage limits. Developers who thought they were burning through limits faster than normal were right.

Bug 3 — Verbosity restriction (April 16 – April 20). A system prompt capped Claude to 25 words between tool calls and 100 words in final responses. Combined with other prompt changes, internal benchmarks showed a 3% coding quality decline. Reverted after four days.

Anthropic published a full postmortem on April 23, named each bug, admitted the initial handling was wrong, and reset all subscriber usage limits. The postmortem is worth reading before you subscribe.

The trust framing here matters: these were not model regressions. The underlying model did not get worse. Engineering decisions degraded the product, and the initial denial extended the damage by weeks. Whether that changes your evaluation depends on how much weight you place on vendor transparency versus product trajectory. Some developers migrated to Codex CLI during this window and have not come back.

No IDE, no autocomplete

Claude Code has no inline autocomplete, no editor panel, no VS Code extension beyond opening files. If Tab-complete is part of how you code, Claude Code does not replace that gap. Many teams run Cursor for in-IDE work and Claude Code for agentic tasks — the combination is common enough to be its own recognized usage pattern.

Online-only

No offline mode, no air-gap. Enterprise customers on restricted networks need Bedrock or Vertex AI integration, which adds setup complexity. Remote Control cloud sessions additionally do not support AWS Bedrock credentials or AWS SSO (IAM Identity Center) — a gap that affects legitimate enterprise teams accessing Claude through existing AWS infrastructure.

Long session degradation

Community reports of quality degradation in very long sessions — context window fills, earlier instructions drift. Not unique to Claude Code, but more noticeable because the agentic loop runs longer and accumulates more context than a typical chat session. Mitigation: use /resume to load a prior session or start fresh sessions per task unit rather than one all-day session.

How it stacks up

vs. Cursor

Cursor is a VS Code fork, not a terminal tool. It has inline autocomplete, sidebar chat, and full editor integration. The default model scored 58.0% on SWE-bench in a Feb 2026 sample; Claude Sonnet 4.6 scores 79.6%. Cursor lets you swap in Sonnet 4.6 as the underlying model, which narrows the accuracy gap significantly for users who know to switch.

The practical split: Claude Code wins on agentic accuracy and multi-file depth. Cursor wins if your workflow depends on the editor layer. Many teams use both. See our Cursor vs Claude Code comparison for the full head-to-head test breakdown.

vs. GitHub Copilot

Copilot costs $10/mo and integrates across VS Code, JetBrains, and other IDEs with solid inline autocomplete and chat. It falls behind Claude Code on complex multi-file agentic tasks by a meaningful margin. For developers who live in an IDE and don’t need the agentic loop, Copilot is a reasonable choice. For agentic work, the accuracy gap is real and Copilot’s price advantage doesn’t close it.

vs. Codex CLI

Codex CLI (OpenAI) picked up community esteem during the April 2026 regression. The architectural difference: Codex runs inside Docker containers (sandboxed by default), Claude Code runs on the host or in a Remote Control cloud environment. Codex is the better fit for teams that want strong sandboxing without configuring it. Claude Code is stronger on model quality and the agentic feature set for teams that trust their environment controls. If you migrated to Codex during the regression period, the April postmortem and limit reset are worth a reassessment.

Pricing

PlanPrice/moModel accessCapacity
Pro$20Sonnet 4.6, Opus 4.6~44K tokens / 5-hr window
Max 5×$100Sonnet 4.6, Opus 4.75× Pro limits
Max 20×$200All models, xhigh effort20× Pro limits
APIPay-per-tokenAllNo cap (Sonnet 4.6: $3/MTok in, $15/MTok out)
Teams Premium$100/seat (min 5)Full Claude CodeMax 5× per seat
EnterpriseCustomAll + 500K contextCustom; HIPAA, SCIM, SSO

The Pro-to-Max decision: if you’re hitting Pro limits more than once a week, Max 5× ($100/mo) pays for itself in recovered session time. If you’re building on top of Claude Code — CI pipelines, custom tooling, batch review workflows — API is the right model; just add budget controls before you start. Interactive daily use on API without a spend ceiling is how invoices surprise you.

There is no consumer affiliate link program. Anthropic runs an Enterprise Referral Partner Program (B2B, approved partners only) and a Guest Pass mechanism for Max subscribers: share up to 3 free-week trials; if a recipient converts to a paid plan, you receive $10 in usage credit. The Claude for Open Source program offers 6 months of Max 20× ($200/mo value) to qualifying open-source maintainers through June 30, 2026.

Who it’s for

Use Claude Code if you:

  • Are comfortable working in a terminal
  • Do multi-file, agentic work — refactors, codemods, feature-to-test pipelines
  • Are already subscribed to Claude Pro or Max for other workflows
  • Want the best model accuracy at the $20 entry price
  • Are building custom workflows with MCP and SKILL.md

Look elsewhere if you:

  • Depend on inline autocomplete as part of your daily coding flow
  • Work primarily on Windows and need fully stable IDE integration today
  • Need offline or air-gap operation
  • Are evaluating team seats where Cursor Teams ($40/seat) may cover the use case for less
  • Lost trust in Anthropic after the April regression — the postmortem and limit reset are genuine, but trust restores at your pace

Final verdict

Claude Code is the right default for terminal-comfortable engineers doing agentic work. Sonnet 4.6 at 79.6% SWE-bench is materially ahead of Cursor’s default model (58.0%) and well ahead of Copilot on complex multi-file tasks. The agentic feature set — background agents, plan mode, MCP, plugin marketplace — has real depth. The April postmortem demonstrated that Anthropic can identify and fix product-level bugs and own the communication failure, which is more than most vendors do.

The honest constraints: Pro usage limits are tight for anyone doing several real sessions per day, the March–April regression was a multi-week trust wound, and the lack of IDE integration excludes a large segment of developers entirely.

Start on Pro, run a real project through it for a week, and measure the usage consumption honestly. If you’re consistently hitting limits and the output quality justifies the cost, Max 5× is the upgrade that makes it a daily tool. If you’re also thinking about the frontend stack for whatever you’re building, the Next.js vs Astro breakdown covers where that decision sits in 2026.