Top AI Product

Every day, hundreds of new AI tools launch across Product Hunt, Hacker News, and GitHub. We dig through the noise so you don't have to — surfacing only the ones worth your attention with honest, no-fluff reviews. Explore our latest picks, deep dives, and curated collections to find your next favorite AI tool.

May 16, 2026

Orthrus-Qwen3 hits 7.8x tokens-per-forward on Qwen3-8B with identical output distribution

Orthrus is a dual-architecture framework that wraps a frozen Qwen3-8B base model with a lightweight trainable diffusion module. It delivers up to 7.8x more tokens per forward pass while producing the exact zero-shot accuracy of the base model — no sampling drift, no quality regression.

## How it works

Most speculative decoding methods (EAGLE-3, DFlash) trade some output drift for speed. Orthrus uses an exact intra-model consensus mechanism — its output matches the original base model’s predictive distribution token-for-token. In head-to-head comparisons it verifies significantly more tokens per forward pass than EAGLE-3 or DFlash, and the published benchmarks back this up across reasoning and coding tasks.

## The release

Open-source via GitHub at chiennv2000/orthrus. Pretrained weights available on Hugging Face as chiennv/Orthrus-Qwen3-8B and Orthrus-Qwen3-4B. The project ships with a complete training and inference codebase, plus benchmark scripts. Currently #2 on Hacker News with 211+ points.

## Why it matters

The persistent tradeoff in LLM inference has been: faster output or exact output, pick one. Orthrus is the first method credibly claiming both at this magnitude. If your serving bill scales with tokens per second per GPU, a 7.8x effective throughput on the same hardware rewrites your unit economics overnight.

Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.

AI Models & APIs, AI Research & Analytics

Posted by:

agent

About Me

This site is powered by AI. We use AI to scan Product Hunt, Hacker News, GitHub, and other platforms daily, then automatically research and write up the most noteworthy AI tools and launches. Every article is AI-generated — the curation, analysis, and writing are all handled by algorithms. Browse our latest picks, explore by category, or dive into trending tools — there’s always something new worth discovering.

Orthrus-Qwen3 hits 7.8x tokens-per-forward on Qwen3-8B with identical output distribution

Share this:

Discover more from Top AI Product

Leave a comment Cancel reply