Top AI Product

We track trending AI tools across Product Hunt, Hacker News, GitHub, and more  — then write honest, opinionated takes on the ones that actually matter. No press releases, no sponsored content. Just real picks, published daily.  Subscribe to stay ahead without drowning in hype.


Gemini 3.1 Flash-Lite: Google’s Cheapest Model Just Got Surprisingly Good

Google quietly dropped [Gemini 3.1 Flash-Lite](https://deepmind.google/models/model-cards/gemini-3-1-flash-lite/) on March 3rd, and honestly, the pricing alone made me do a double-take. We’re talking $0.25 per million input tokens and $1.50 per million output tokens. That’s one-eighth the cost of Gemini 3.1 Pro. For bulk workloads like translation, content moderation, or e-commerce processing, those numbers start to matter a lot.

But cheap doesn’t mean slow. Google claims 2.5x faster time-to-first-token compared to Gemini 2.5 Flash, plus a 45% bump in output speed. I’ve been poking around with it in [Google AI Studio](https://ai.google.dev/gemini-api/docs/changelog), and the response times genuinely feel snappy — even with reasoning turned on.

Speaking of reasoning, the standout feature here is adjustable thinking levels. You get four options: minimal, low, medium, and high. So if you’re running a simple classification task, you can dial it down to minimal and save on latency. Need something more thoughtful? Crank it up. [Simon Willison tested this out](https://simonwillison.net/2026/Mar/3/gemini-31-flash-lite/) on his blog by having the model generate pelican drawings at each thinking level — a fun way to visualize the difference.

The buzz has been real. It picked up [158 upvotes on Product Hunt](https://www.producthunt.com/products/gemini-6) on March 4th, [VentureBeat ran a piece](https://venturebeat.com/technology/google-releases-gemini-3-1-flash-lite-at-1-8th-the-cost-of-pro) highlighting the 1/8th cost angle, and [MarkTechPost](https://www.marktechpost.com/2026/03/03/google-drops-gemini-3-1-flash-lite-a-cost-efficient-powerhouse-with-adjustable-thinking-levels-designed-for-high-scale-production-ai/) did a deep breakdown of the architecture. Over on [Hacker News](https://news.ycombinator.com/item?id=47234962), folks were already comparing it against other budget models, with one commenter noting that “Gemini is slowly making $15/month voice apps obsolete.”

The benchmarks back up the hype too. Google reports an Elo score of 1432 on the Arena.ai leaderboard and 86.9% on GPQA Diamond, which puts it ahead of its predecessor across key reasoning and multimodal tasks. For a model at this price point, that’s impressive.

If you’re running anything at scale where cost-per-token keeps you up at night, Flash-Lite is worth a serious look. It’s currently available in preview through the [Gemini API](https://ai.google.dev/gemini-api/docs/changelog) and [Vertex AI](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-1-flash-lite). Not the flashiest launch Google has ever done, but maybe the most practical one in a while.


Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.



Leave a comment