Google pushed Gemini 3.1 Flash-Lite to General Availability on May 7. It’s the cheapest, fastest model in the Gemini 3 family — and the most interesting one for anyone running real production traffic.
What it is
A lightweight LLM API, not a consumer product. Pricing is $0.25 per million input tokens and $1.50 per million output — roughly one-eighth the cost of Gemini 3 Pro. On Artificial Analysis benchmarks, Time to First Token is 2.5x faster than 2.5 Flash, output speed is up 45%, and quality holds even or slightly ahead. You ship it through the Gemini API in AI Studio or through Vertex AI inside Gemini Enterprise. Same prompt, two front doors.
Why this matters
Google isn’t winning the frontier race against Claude 4.7. So they’re attacking the other end. Customer support bots, real-time trading agents, batch classification, RAG pipelines doing millions of calls a day — these workloads care about cost-per-token and latency, not model IQ. At $0.25/M input, Flash-Lite undercuts most serious competitors on the price-per-quality curve.
If you’re running anything high-frequency on GPT-4o-mini or Claude Haiku 4.5, this is worth a benchmark run.
You Might Also Like
- Gemini 3 1 Flash Lite Googles Cheapest Model Just got Surprisingly Good
- Googles Gemini 3 1 Flash Live Scores 90 8 on Audio Benchmarks Real Time Voice ai Gets Serious
- Google Rolls out File Generation in Gemini Chasing Chatgpt Canvas and Claude Artifacts
- Google Lyria 3 Just Turned Gemini Into a Music Studio and im Weirdly Into it
- Gemini 3 1 pro Just Dropped and the Benchmarks are Wild

Leave a comment