Top AI Product

We track trending AI tools across Product Hunt, Hacker News, GitHub, and more — then write honest, opinionated takes on the ones that actually matter. No press releases, no sponsored content. Just real picks, published daily. Subscribe to stay ahead without drowning in hype.

March 4, 2026

Gemini 3.1 Flash-Lite: Google’s Cheapest Model Just Got Surprisingly Good

Google quietly dropped [Gemini 3.1 Flash-Lite](https://deepmind.google/models/model-cards/gemini-3-1-flash-lite/) on March 3rd, and honestly, the pricing alone made me do a double-take. We’re talking $0.25 per million input tokens and $1.50 per million output tokens. That’s one-eighth the cost of Gemini 3.1 Pro. For bulk workloads like translation, content moderation, or e-commerce processing, those numbers start to matter a lot.

But cheap doesn’t mean slow. Google claims 2.5x faster time-to-first-token compared to Gemini 2.5 Flash, plus a 45% bump in output speed. I’ve been poking around with it in [Google AI Studio](https://ai.google.dev/gemini-api/docs/changelog), and the response times genuinely feel snappy — even with reasoning turned on.

Speaking of reasoning, the standout feature here is adjustable thinking levels. You get four options: minimal, low, medium, and high. So if you’re running a simple classification task, you can dial it down to minimal and save on latency. Need something more thoughtful? Crank it up. [Simon Willison tested this out](https://simonwillison.net/2026/Mar/3/gemini-31-flash-lite/) on his blog by having the model generate pelican drawings at each thinking level — a fun way to visualize the difference.

The buzz has been real. It picked up [158 upvotes on Product Hunt](https://www.producthunt.com/products/gemini-6) on March 4th, [VentureBeat ran a piece](https://venturebeat.com/technology/google-releases-gemini-3-1-flash-lite-at-1-8th-the-cost-of-pro) highlighting the 1/8th cost angle, and [MarkTechPost](https://www.marktechpost.com/2026/03/03/google-drops-gemini-3-1-flash-lite-a-cost-efficient-powerhouse-with-adjustable-thinking-levels-designed-for-high-scale-production-ai/) did a deep breakdown of the architecture. Over on [Hacker News](https://news.ycombinator.com/item?id=47234962), folks were already comparing it against other budget models, with one commenter noting that “Gemini is slowly making $15/month voice apps obsolete.”

The benchmarks back up the hype too. Google reports an Elo score of 1432 on the Arena.ai leaderboard and 86.9% on GPQA Diamond, which puts it ahead of its predecessor across key reasoning and multimodal tasks. For a model at this price point, that’s impressive.

If you’re running anything at scale where cost-per-token keeps you up at night, Flash-Lite is worth a serious look. It’s currently available in preview through the [Gemini API](https://ai.google.dev/gemini-api/docs/changelog) and [Vertex AI](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-1-flash-lite). Not the flashiest launch Google has ever done, but maybe the most practical one in a while.

Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.

Uncategorized

Posted by:

agent

About Me

Hi. I’m a builder who’s obsessed with what AI can actually do — not the hype, but the real tools people ship every day. I use AI to help me find, research, and write about the most interesting AI products launching across Product Hunt, Hacker News, GitHub, and everywhere else. The articles are AI-assisted. The curiosity is mine. I started this site because I was already spending hours every day digging through launches and repos. Figured I might as well share what I find. If something shows up here, it’s because I thought it was genuinely worth your time.

Gemini 3.1 Flash-Lite: Google’s Cheapest Model Just Got Surprisingly Good

Share this:

Discover more from Top AI Product

Leave a comment Cancel reply