Top AI Product

We track trending AI tools across Product Hunt, Hacker News, GitHub, and more — then write honest, opinionated takes on the ones that actually matter. No press releases, no sponsored content. Just real picks, published daily. Subscribe to stay ahead without drowning in hype.

February 24, 2026

Mercury 2 Just Hit 1,000 Tokens Per Second — And It’s Not Even Using Transformers

There’s a new model making noise on [Hacker News](https://news.ycombinator.com/item?id=47144464) right now, and for once, the hype might actually be warranted. [Mercury 2](https://www.inceptionlabs.ai/blog/introducing-mercury-2), built by a startup called Inception Labs, is claiming 1,000+ tokens per second throughput — roughly 5x faster than the speediest LLMs out there. But the really interesting part isn’t the raw numbers. It’s how they got there.

Mercury 2 is what Inception calls a “diffusion large language model,” or dLLM. If you’ve followed image generation at all, diffusion should ring a bell. Instead of the classic autoregressive approach where the model predicts one token at a time left to right, Mercury 2 generates a rough draft of the entire output and then refines it through multiple denoising passes, improving many tokens in parallel. Think of it like an editor reworking an entire paragraph at once rather than writing it word by word. It’s a fundamentally different way to think about text generation, and honestly, it’s kind of wild that it works this well.

On quality benchmarks, [Inception says](https://www.inceptionlabs.ai/) Mercury 2 sits in the same ballpark as Claude 4.5 Haiku and GPT 5.2 Mini — solid models for everyday tasks. Where it pulls ahead is cost: pricing comes in at $0.25 per million input tokens and $0.75 per million output tokens, which undercuts most competitors by a wide margin. For latency-sensitive applications like coding assistants, browser agents, or anything involving lots of back-and-forth API calls, that speed advantage is genuinely meaningful. As [one HN commenter put it](https://news.ycombinator.com/item?id=47144464), a lot of practical AI workflows right now are simply “tok/s bottlenecked.”

The model is available today through the [Inception API](https://docs.inceptionlabs.ai/get-started/models), which is OpenAI-compatible, so swapping it in shouldn’t be painful. The backing is serious too — Menlo Ventures, M12 (Microsoft’s VC arm), and angel investors like Andrew Ng and Andrej Karpathy are behind Inception.

Whether diffusion models can scale to match the smartest autoregressive reasoning models remains an open question. But Mercury 2 makes a strong case that Transformers aren’t the only path forward. If you care about inference speed and cost — and who doesn’t — this one’s worth keeping an eye on.

Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.

Uncategorized

Posted by:

agent

About Me

Hi. I’m a builder who’s obsessed with what AI can actually do — not the hype, but the real tools people ship every day. I use AI to help me find, research, and write about the most interesting AI products launching across Product Hunt, Hacker News, GitHub, and everywhere else. The articles are AI-assisted. The curiosity is mine. I started this site because I was already spending hours every day digging through launches and repos. Figured I might as well share what I find. If something shows up here, it’s because I thought it was genuinely worth your time.

Mercury 2 Just Hit 1,000 Tokens Per Second — And It’s Not Even Using Transformers

Share this:

Discover more from Top AI Product

Leave a comment Cancel reply