AI is writing more code than ever. Anthropic’s own engineers saw their code output jump roughly 200% year over year. But here’s the catch: someone still has to review all that code. Human reviewers can’t scale at the same rate, and the result is a growing gap between code produced and code properly checked.
That’s the problem Anthropic is trying to solve with Code Review, a new multi-agent system built directly into Claude Code. It shipped on March 9, 2026, and it’s already generating serious discussion across the developer community.
The Problem No One Wants to Talk About
AI coding tools have gotten remarkably good at generating code. GitHub reports that Copilot users accept roughly 30% of suggestions. Claude Code, Cursor, and other tools are helping developers ship faster than ever.
But faster output creates a new bottleneck: review. When your team’s code volume doubles, your review capacity doesn’t magically double with it. PRs pile up. Reviews get superficial. Bugs slip through.
Anthropic experienced this firsthand. Before launching Code Review internally, only about 16% of their pull requests received what they’d call “substantive” feedback — comments that actually caught meaningful issues. The rest got rubber-stamped or lightly skimmed.
How Anthropic’s Multi-Agent System Actually Works
Code Review doesn’t work like a typical linter or static analysis tool. When a developer opens a pull request, the system dispatches multiple AI agents that run in parallel. Each agent independently searches for different types of errors across the codebase.
Here’s what makes it different from simpler approaches:
- Parallel analysis: Multiple agents examine the PR simultaneously, each looking for different categories of issues
- Cross-verification: After individual analysis, agents compare findings and cross-check each other’s conclusions to filter out false positives
- Codebase-aware: The agents don’t just look at the diff — they consider the entire codebase to catch cases where a change in one file breaks something in another
- Severity ranking: Remaining issues get sorted by severity, so developers see the most critical problems first
- Adaptive depth: Simple PRs get a lightweight pass; complex ones engage more agents for a deeper review
The output shows up as a single overview comment on the PR plus inline annotations for specific bugs. If it finds issues, it also suggests fixes that Claude Code can implement on request.
According to Anthropic’s Wu, the team deliberately focused on logical errors rather than style nitpicks. The reasoning: in AI-generated reviews, developers really just want the logic errors. Nobody needs an AI to tell them about missing semicolons — that’s what linters are for.
The Numbers Behind the Launch
The internal results at Anthropic tell a clear story:
- Before Code Review: 16% of PRs received substantive feedback
- After Code Review: 54% of PRs received substantive feedback
- Average review time: ~20 minutes per PR
- Cost per review: $15–$25, depending on complexity
That 16% to 54% jump is significant. It means more than three times as many pull requests are getting meaningful review comments. Anthropic says their developers have come to expect Code Review comments on their PRs — and “get a little nervous” when they don’t see them.
The feature is launching as a research preview for Claude for Teams and Claude for Enterprise customers. Pricing is token-based, so simpler PRs cost less and complex ones cost more.
How It Stacks Up Against GitHub Copilot Code Review
GitHub Copilot’s code review feature hit general availability in April 2025 and reached 1 million users within a month. It’s fast, widely adopted, and integrated directly into GitHub’s UI. But it has a fundamental limitation: it’s diff-based.
That means Copilot reviews the changes in isolation. It catches typos, null checks, and simple logic errors effectively. But it misses architectural problems and cross-file dependencies because it doesn’t have the broader codebase context.
Anthropic’s approach trades speed for depth. A 20-minute review is significantly slower than Copilot’s near-instant feedback. But the multi-agent, codebase-aware approach catches categories of bugs that diff-based tools simply can’t see.
Other players in this space include:
- Graphite Agent: Shopify reported 33% more PRs merged per developer after adoption, with engineers at Asana saving 7 hours weekly
- CodeRabbit: The only option that works across GitHub, GitLab, Bitbucket, and Azure DevOps
- Greptile: Indexes your entire codebase upfront for deeper analysis
The $15–$25 per review price point is notably higher than alternatives. For a team pushing 50 PRs a day, that’s $750–$1,250 daily. Whether the depth justifies the cost depends entirely on what kind of code you’re shipping and what bugs cost you in production.
Who Should Care About This
Code Review makes the most sense for teams where:
- AI-generated code is a large percentage of output — if your developers are using Claude Code, Copilot, or Cursor heavily, the volume of code that needs review is growing faster than your team
- Bugs in production are expensive — for fintech, healthcare, infrastructure, and enterprise SaaS, a bug that slips through review can cost orders of magnitude more than $25
- You’re already in the Anthropic ecosystem — if your team uses Claude for Teams or Enterprise, Code Review slots in without additional vendor management
It’s less compelling for small teams with low PR volume, open-source projects with tight budgets, or teams that primarily write code manually and have solid existing review processes.
If you’re interested in how Anthropic has been building out its developer tooling ecosystem, check out related coverage on Claude Code’s security features, Claude Code Remote Control, and the Anthropic vs Pentagon standoff that’s been happening in parallel with this launch.
FAQ
How much does Anthropic Code Review cost?
Pricing is token-based and varies by PR complexity. Anthropic estimates $15–$25 per review on average. It’s available to Claude for Teams and Claude for Enterprise customers during the research preview.
Can I use Code Review with GitLab or Bitbucket?
Currently, Code Review integrates with GitHub. There’s no announced support for GitLab, Bitbucket, or Azure DevOps yet. If you need multi-platform support, CodeRabbit is the main alternative that covers all four.
How does Anthropic Code Review compare to GitHub Copilot code review?
Copilot is faster (near-instant) and cheaper, but it’s diff-based and misses cross-file issues. Anthropic’s system takes ~20 minutes but analyzes the full codebase context using multiple agents. The trade-off is speed and cost vs. depth and accuracy.
Does Code Review work on code not written by AI?
Yes. While it’s positioned as a solution for the surge in AI-generated code, it reviews any code in a pull request regardless of how it was written. The multi-agent system looks for logical errors, not AI-specific patterns.
Will Code Review replace human code reviewers?
Anthropic isn’t positioning it as a replacement. The 54% substantive feedback rate means nearly half of PRs still don’t trigger meaningful comments. It’s designed to augment human review by catching issues that might be missed under time pressure, not to eliminate the need for human judgment on architecture, design patterns, and business logic.
You Might Also Like
- Entire the 60m bet on Fixing ais Code Review Headache
- Stripe Minions 1300 Pull Requests a Week Zero Human Written Code
- Claude Code Security Just Dropped and it Already Found 500 Zero Days Nobody Knew About
- Claude Code Remote Control Just Turned my Phone Into a Coding Terminal and im Weirdly Into it
- Obra Superpowers Turned my Claude Code Into a Proper Engineer and im not Going Back

Leave a comment