Google opened Google I/O 2026 yesterday with Gemini 3.5 Flash — a frontier model that combines reasoning with agentic task execution. The headline: 4x faster output tokens per second than other frontier models, while beating Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. Gemini 3.5 Pro is in internal testing now, with public availability next month.
## What “agentic-first” means here
Gemini 3.5 Flash was tuned specifically for the long, tool-heavy sessions where the model has to plan, call APIs, observe results, and iterate across many steps. The 4x throughput claim matters more here than for chat — agentic loops burn tokens linearly with steps, so doubling output speed effectively doubles the number of agent steps you can run per unit time.
## The Flash-Pro lineup
Flash today, Pro next month. Google has historically used Flash for cost-sensitive deployment and Pro for premium reasoning workloads. Both 3.5 versions share the same agentic-first training emphasis — meaning even the cost-efficient tier ships with tool-use and long-horizon competence baked in.
## Why it matters
The big AI labs are all explicitly tuning for agentic workloads, not just chat. Gemini 3.5 Flash is Google’s clearest answer to Claude Opus 4.7 and GPT-5.5 — and shipping it with 4x throughput is the lever Google is pulling to differentiate on price-performance.

Leave a comment