Top AI Product

Every day, hundreds of new AI tools launch across Product Hunt, Hacker News, and GitHub. We dig through the noise so you don't have to — surfacing only the ones worth your attention with honest, no-fluff reviews. Explore our latest picks, deep dives, and curated collections to find your next favorite AI tool.


OpenAI GeneBench-Pro: top AI models fail 70% of real biology tasks

OpenAI dropped a number that stings. On July 1 it released GeneBench-Pro, a benchmark for computational biology agents — and the best model on it, GPT-5.6 Sol at max reasoning, passes only 28.7% (31.5% in Pro mode). The strongest non-OpenAI model, Claude Opus 4.8, gets 16%. Everyone else is worse.

What it actually tests

Not trivia. 129 problems across genomics, quantitative biology, and translational medicine. Each hands the model a real-shaped dataset plus experimental context and says: pick your own method, run the analysis, give a conclusion. Every problem is synthetically generated so there’s a known ground truth to grade against, and 82 were vetted by outside professors, postdocs, and industry scientists. This isn’t recall — it’s research judgment.

Why it matters

Frontier models write code and pass med exams, yet flub 70% of messy biology work. That gap is the whole point. It’s also a shot at Anthropic’s AI-for-Science pitch. Representative problems are open-sourced, so anyone can see how hard the bar really is.


You Might Also Like


Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.



Leave a comment