Google quietly shipped [Gemini 3.1 Pro](https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/) on February 19th, and honestly, the numbers caught me off guard. On ARC-AGI-2, which is one of the tougher reasoning benchmarks out there, it scored 77.1%. That’s more than double what Gemini 3 Pro managed (31.1%). For a “.1” update, that’s not a minor bump — that’s a massive leap.
The release immediately blew up online. It hit [the top of Hacker News](https://news.ycombinator.com/item?id=47074735) with 591 points and over 730 comments, making it the most-discussed AI story of the day. [VentureBeat ran a piece](https://venturebeat.com/technology/google-launches-gemini-3-1-pro-retaking-ai-crown-with-2x-reasoning) saying Google was “retaking the AI crown,” while [TechCrunch](https://techcrunch.com/2026/02/19/googles-new-gemini-pro-model-has-record-benchmark-scores-again/) and [MarkTechPost](https://www.marktechpost.com/2026/02/19/google-ai-releases-gemini-3-1-pro-with-1-million-token-context-and-77-1-percent-arc-agi-2-reasoning-for-ai-agents/) both covered it within hours.
What’s interesting here is Google’s approach. This is their first “.1” mid-generation update for Gemini. Instead of waiting for a full new generation, they pushed out a meaningful upgrade while keeping the pricing exactly the same — $2 per million input tokens, $12 per million output. The 1 million token input window is still there too, along with 65K tokens of output. So you’re getting a significantly smarter model without paying a cent more.
The rollout is broad. It’s already live in the [Gemini App](https://gemini.google.com/), NotebookLM, [Google AI Studio](https://aistudio.google.com/), Vertex AI, Gemini CLI, and Android Studio. If you’re a developer, you can try it right now through the [Gemini API](https://ai.google.dev/gemini-api/docs/models/gemini-3.1-pro-preview) — there’s even a separate endpoint for custom tool usage, which is handy if you’re building agents.
Now, the HN comments tell a more nuanced story. Some developers are still frustrated with Gemini’s coding habits — things like unwanted refactors and adding unsolicited comments. Fair points. But the raw reasoning improvement is hard to ignore. Doubling your ARC-AGI-2 score in a single update is the kind of progress that makes you pay attention, whether or not the model nails every coding task perfectly yet. This one’s worth keeping an eye on.

Leave a comment