There’s a new model making noise on [Hacker News](https://news.ycombinator.com/item?id=47144464) right now, and for once, the hype might actually be warranted. [Mercury 2](https://www.inceptionlabs.ai/blog/introducing-mercury-2), built by a startup called Inception Labs, is claiming 1,000+ tokens per second throughput — roughly 5x faster than the speediest LLMs out there. But the really interesting part isn’t the raw numbers. It’s how they got there.
Mercury 2 is what Inception calls a “diffusion large language model,” or dLLM. If you’ve followed image generation at all, diffusion should ring a bell. Instead of the classic autoregressive approach where the model predicts one token at a time left to right, Mercury 2 generates a rough draft of the entire output and then refines it through multiple denoising passes, improving many tokens in parallel. Think of it like an editor reworking an entire paragraph at once rather than writing it word by word. It’s a fundamentally different way to think about text generation, and honestly, it’s kind of wild that it works this well.
On quality benchmarks, [Inception says](https://www.inceptionlabs.ai/) Mercury 2 sits in the same ballpark as Claude 4.5 Haiku and GPT 5.2 Mini — solid models for everyday tasks. Where it pulls ahead is cost: pricing comes in at $0.25 per million input tokens and $0.75 per million output tokens, which undercuts most competitors by a wide margin. For latency-sensitive applications like coding assistants, browser agents, or anything involving lots of back-and-forth API calls, that speed advantage is genuinely meaningful. As [one HN commenter put it](https://news.ycombinator.com/item?id=47144464), a lot of practical AI workflows right now are simply “tok/s bottlenecked.”
The model is available today through the [Inception API](https://docs.inceptionlabs.ai/get-started/models), which is OpenAI-compatible, so swapping it in shouldn’t be painful. The backing is serious too — Menlo Ventures, M12 (Microsoft’s VC arm), and angel investors like Andrew Ng and Andrej Karpathy are behind Inception.
Whether diffusion models can scale to match the smartest autoregressive reasoning models remains an open question. But Mercury 2 makes a strong case that Transformers aren’t the only path forward. If you care about inference speed and cost — and who doesn’t — this one’s worth keeping an eye on.

Leave a comment