So Google DeepMind quietly dropped something pretty wild on February 12th. It’s called [Aletheia](https://deepmind.google/blog/accelerating-mathematical-and-scientific-discovery-with-gemini-deep-think/), and it’s not just another AI that can ace math competitions. This thing is actually doing *research-level* mathematics on its own — the kind of work that used to require PhD-level expertise and years of effort.
Here’s the deal. Aletheia is built on top of Gemini Deep Think and uses a three-part architecture: a Generator that proposes candidate solutions, a Verifier that checks for errors and hallucinations, and a Reviser that patches things up when the proof isn’t quite right. The whole pipeline runs in a loop — generate, verify, revise, repeat — until it either nails the answer or honestly admits it can’t solve the problem. That last part is surprisingly important. An AI that knows when it’s wrong is way more useful than one that confidently spits out garbage.
The numbers speak for themselves. Aletheia hit around 91.9% accuracy on [IMO-ProofBench Advanced](https://www.marktechpost.com/2026/02/12/google-deepmind-introduces-aletheia-the-ai-agent-moving-from-math-competitions-to-fully-autonomous-professional-research-discoveries/), which tests Olympiad-level proof writing. But the really interesting part? DeepMind threw it at 700 open problems from the Erdos Conjectures database. Out of those, it produced 63 technically correct solutions and autonomously resolved four genuinely open questions — problems that hadn’t been solved before. It even generated a full research paper on arithmetic geometry eigenweights with zero human intervention. You can actually check out the [prompts and outputs on GitHub](https://github.com/google-deepmind/superhuman/tree/main/aletheia) and the [full paper on arXiv](https://arxiv.org/abs/2602.10177).
Now, it’s not perfect. A closer look at the evaluation shows that many of its attempts still miss the mark — about 68% of responses were fundamentally wrong, and the system has a tendency to reinterpret hard questions into easier ones it can actually answer. So we’re not at “fire all the mathematicians” territory. But the fact that it can occasionally crack problems that humans haven’t solved yet is genuinely remarkable.
What makes Aletheia different from previous math AI efforts is that jump from competition tricks to actual research. Solving an IMO problem is impressive, but those are problems *designed* to be solvable. Open conjectures are a different beast entirely. The [Hacker News threads](https://news.ycombinator.com/item?id=46984749) have been buzzing about this, and for good reason — this feels like a real inflection point for AI-assisted scientific discovery. Whether Aletheia becomes a full-on research partner or just a very fancy proof-checking assistant, it’s hard to look at this and not feel like something fundamental just shifted.
Leave a comment