Sina Weibo’s AI team put out a claim that set off another benchmark fight: a 3-billion-parameter model reasoning at the level of systems hundreds of times its size. VibeThinker-3B is open, MIT-licensed, and runs on a single consumer GPU.
## The benchmarks
On AIME 2026, the US math competition, VibeThinker-3B scores 94.3 — level with DeepSeek V3.2 at 671B parameters and ahead of Gemini 3 Pro’s 91.7. It posts 80.2 Pass@1 on LiveCodeBench v6 and a 96.1% acceptance rate on unseen LeetCode contests from late April through May. On IFBench it hits 74.5, above Claude Opus 4.5’s 58.0. The viral “Opus performance at 3B” line is shorthand — the real claim is parity on verifiable reasoning benchmarks, not general-purpose parity.
## How it was trained
VibeThinker is a dense model built on Qwen2.5-Coder-3B, pushed with a “Spectrum-to-Signal” post-training pipeline that first maximizes answer diversity, then reinforces the correct reasoning paths. The weights are on Hugging Face and fit in roughly 6.7GB of VRAM — small enough to run where a frontier model can’t, which is the entire point of the release.

Leave a comment