MiniMax released the Highspeed variant of its M2.7 coding model on May 18 — a latency-tuned version delivering roughly 100 tokens per second versus 60 for standard M2.7, with identical output behavior. It matches or approaches Claude Opus 4.6 and GPT-5 on the hardest coding and agentic benchmarks while running 3x faster and costing a fraction of the price.
## What “highspeed” buys you
Same model weights, same output quality — just optimized for lower latency and higher throughput. The sweet spot is interactive coding agents, tool-calling pipelines, and office-automation flows where responsiveness matters more than squeezing the last benchmark point. When an agent makes 50 tool calls in a session, the gap between 60 and 100 tokens per second compounds into minutes saved per task.
## Where it fits
M2.7 has been pitched as frontier-adjacent coding performance at fractional cost since its March debut. The Highspeed variant is the deployment-ready version for production agent workloads — the one you’d actually wire into a CI pipeline or a customer-facing coding assistant.
## Why it matters
The frontier-versus-cost tradeoff is collapsing for coding specifically. A model that matches Opus 4.6 on coding benchmarks at 3x the speed and a fraction of the price changes which workloads are economically viable to automate. MiniMax is making the same bet as Cursor’s Composer 2.5: own the coding vertical with cheap, fast, good-enough models.

Leave a comment