Cursor Composer 2 takes on Anthropic and OpenAI with a $0.50/M token coding model — and the benchmarks back it up

For the past two years, AI coding tools have lived and died by the models underneath them. Cursor rode Claude. GitHub Copilot ran on OpenAI. Windsurf mixed and matched. Everyone was a reseller with a nice UI on top.

That dynamic shifted on March 19, 2026, when Cursor unveiled Composer 2 — a proprietary, code-only model trained entirely in-house by Anysphere, the company behind Cursor. Bloomberg broke the story the same day: Cursor is no longer just an IDE company. It’s a model company now, and it’s aiming directly at the providers it used to depend on.

The numbers tell a compelling story. On Terminal-Bench 2.0, Composer 2 scores 61.7%, beating Claude Opus 4.6‘s 58.0%. On CursorBench, it hits 61.3 versus Opus 4.6’s 58.2. And it does all this at $0.50 per million input tokens — one-tenth the price of Anthropic’s flagship model.

The benchmark picture: strong, not dominant

Composer 2’s performance is impressive, but it’s not a clean sweep. Here’s where things stand across the three benchmarks Cursor highlighted:

Benchmark	Composer 2	Composer 1.5	Claude Opus 4.6	GPT-5.4 Thinking
CursorBench	61.3	44.2	58.2	63.9
Terminal-Bench 2.0	61.7	47.9	58.0	75.1
SWE-bench Multilingual	73.7	65.9	77.8	N/A

Against its own predecessor, the jump is massive — a 29% improvement on Terminal-Bench 2.0, and a 39% gain on CursorBench compared to Composer 1.5. That’s not incremental polish. That’s a generational leap within a single product line.

Against the big labs, the picture is more nuanced. Composer 2 beats Claude Opus 4.6 on two of three benchmarks but falls short on SWE-bench Multilingual (73.7 vs. 77.8). It beats Opus on Terminal-Bench 2.0 by a healthy margin (61.7 vs. 58.0) but trails GPT-5.4 significantly on the same benchmark (61.7 vs. 75.1).

The takeaway: Composer 2 is genuinely competitive with frontier models on coding tasks. It’s not the best on every metric, but it’s close enough that the price difference becomes the deciding factor — and on price, it’s not even a contest.

The price gap is enormous

This is where Cursor’s pitch gets hard to ignore.

Model	Input (per 1M tokens)	Output (per 1M tokens)
Composer 2	$0.50	$2.50
Composer 2 Fast	$1.50	$7.50
Claude Opus 4.6	$5.00	$25.00
GPT-5.4 (short context)	$2.50	$15.00
GPT-5.4 (long context)	$5.00	$22.50

Composer 2’s standard tier is 10x cheaper than Opus 4.6 on input and output. Even the fast variant — which Cursor says delivers identical intelligence at higher speed and ships as the default — comes in at $1.50/$7.50, still roughly 3x cheaper than Anthropic and 2x cheaper than OpenAI’s short-context pricing.

For Cursor’s subscription users, this translates to more requests per dollar. For API consumers, it means coding tasks that were previously expensive to automate become viable at scale. Co-founder Aman Sanger told Bloomberg that training exclusively on code data made it possible to build a smaller, more efficient model — the narrow focus is the competitive advantage, not a limitation.

How Cursor trained a frontier-class coding model

Composer 2 represents Anysphere’s first continued pretraining run — meaning they took a base model and continued training it on a code-specific dataset before layering on reinforcement learning. This two-stage approach is what distinguishes Composer 2 from its predecessors, which relied more heavily on prompting and fine-tuning existing third-party models.

The reinforcement learning phase targets long-horizon coding tasks — the kind that require hundreds of individual actions to complete. Think multi-file refactors, complex debugging sessions that span entire modules, or scaffolding a new feature that touches dozens of files. These aren’t tasks you solve in a single prompt-response cycle. They demand sustained reasoning, context tracking, and the ability to recover from mistakes mid-stream.

The most technically interesting piece is what Cursor calls “compaction-in-the-loop RL” — a self-summarization technique where the model learns to compress its own context when approaching token limits. Instead of losing track of earlier work as the context window fills up, Composer 2 summarizes 5,000+ tokens down to roughly 1,000 while preserving the critical details. This isn’t a post-processing step bolted on after training; it’s integrated directly into the reinforcement learning loop, so the model learns what information matters and what can be safely compressed. Cursor reports this reduces compaction errors by 50% compared to their previous approach.

Three Composer releases in five months — Composer 1 in October 2025, Composer 1.5 in February 2026, Composer 2 in March 2026 — shows an iteration speed that most AI labs would envy. The velocity suggests that Anysphere’s training infrastructure and data pipelines are maturing fast.

Why Cursor had to build its own model

The strategic logic behind Composer 2 goes deeper than benchmarks and pricing. Cursor faces what analysts have called a structural dilemma: its product depends on models from companies that are increasingly becoming competitors.

Anthropic launched Claude Code. OpenAI shipped Codex as a standalone app. Google has Gemini CLI. Every major model provider is building or acquiring AI coding experiences. Cursor’s moat — the editor, the UX, the developer workflow — is real, but it’s vulnerable if the underlying models can be pulled away or priced unfavorably.

Reports suggest Cursor’s consumer subscriptions operate at negative margins, subsidized by enterprise contracts. Building a proprietary model that matches frontier performance at a fraction of the cost isn’t just about differentiation. It’s about survival economics. If Cursor can serve the majority of coding requests on its own model, it dramatically changes its cost structure and reduces dependency on providers who are also rivals.

The timing aligns with Anysphere’s reported fundraising talks at a $50 billion valuation. With over 1 million daily active users, 50,000 business customers — including Stripe and Figma — and a $2 billion annual revenue run rate as of February 2026, Cursor has the scale to justify its own model infrastructure. The question was never “can they afford to try?” It was “can they afford not to?”

Community reception: impressed but cautious

Developer reaction has been split along predictable lines. On the Cursor community forum, early adopters report that Composer 2 handles multi-file editing noticeably better than its predecessor. The consensus on long-horizon tasks — complex refactors, large feature implementations — is that the improvement is tangible and not just benchmark theater.

On Reddit’s r/cursor and in broader developer communities, the reception is more measured. Cursor has had a rough stretch with reliability: a confirmed code reversion bug in March 2026, recurring stability issues, and costs that some developers report climbing to $40-50/month with heavy usage. The .cursorrules system still gets praise for persistent project context, but complaints about the AI failing to understand full codebase context haven’t gone away.

Some developers on Hacker News and X have also pointed out an inherent tension in benchmarking: CursorBench is Cursor’s own benchmark, and Terminal-Bench 2.0 is designed around the kind of agentic terminal tasks that Cursor’s model is specifically trained on. SWE-bench Multilingual is the most neutral of the three, and that’s the one where Opus 4.6 still leads.

Independent testing from outlets like The New Stack found that Claude Code uses significantly fewer tokens than Cursor for identical tasks, with one benchmark showing a 5.5x token efficiency advantage for Claude. Token efficiency matters because it directly affects how far a subscription or API budget stretches in practice.

How Composer 2 fits the AI coding landscape in March 2026

The AI coding market in March 2026 looks fundamentally different from a year ago. The era some are calling “Agentic Engineering” — where developers orchestrate AI agents rather than write every line themselves — has shifted the competitive axis from “which model is smartest” to “which system delivers the most useful work per dollar.”

Cursor with Composer 2 occupies a unique position: it’s the first major AI coding tool to ship its own frontier-class model while still offering third-party models (Claude, GPT) as options within the same editor. Users can switch between Composer 2 and Opus 4.6 depending on the task. That flexibility, combined with Cursor’s established editor experience and multi-file editing capabilities, is a genuine differentiator.

But the competition is fierce. GitHub Agent HQ orchestrates multiple agents in parallel. JetBrains Air launched its multi-agent IDE. Cline CLI 2.0 brought AI coding to the terminal. The market is saturated with options, and having a strong model alone won’t be enough — the entire developer experience needs to hold up.

FAQ

How much does Cursor Composer 2 cost?

Composer 2 is available through Cursor’s subscription plans, which include a usage pool. For API-level pricing, the standard tier costs $0.50 per million input tokens and $2.50 per million output tokens. The fast variant (same intelligence, higher speed) costs $1.50/$7.50 per million tokens. Individual plan subscribers get Composer 2 access as part of their standalone usage allocation.

Is Cursor Composer 2 better than Claude Opus 4.6 for coding?

It depends on the task. Composer 2 beats Opus 4.6 on CursorBench (61.3 vs. 58.2) and Terminal-Bench 2.0 (61.7 vs. 58.0), but trails on SWE-bench Multilingual (73.7 vs. 77.8). Opus 4.6 still leads on reasoning depth and long-context coherence, and independent tests show Claude Code is significantly more token-efficient. However, Composer 2 costs roughly 10x less per token, which makes it the better value for high-volume coding tasks.

What is Cursor’s self-summarization technique?

Cursor calls it “compaction-in-the-loop RL.” When Composer 2 approaches its context window limit during a long coding session, it self-summarizes earlier context — compressing thousands of tokens into a fraction of the size while preserving critical information. This is trained as part of the reinforcement learning process, not applied after the fact. Cursor reports it reduces compaction errors by 50%.

Can I still use Claude or GPT inside Cursor?

Yes. Cursor continues to support third-party models including Claude Opus 4.6, GPT-5.4, and others. Composer 2 is an additional option, not a replacement. Users can choose which model to use on a per-task basis.

Who uses Cursor?

Cursor reports over 1 million daily active users and 50,000 business customers, including companies like Stripe and Figma. Anysphere, the company behind Cursor, has reached a $2 billion annual revenue run rate as of February 2026 and is reportedly in talks to raise funding at a $50 billion valuation.

Top AI Product

Leave a comment Cancel reply