OpenAI will serve its flagship GPT-5.6 Sol on Cerebras infrastructure starting July, at up to 750 tokens per second — roughly 15x the ~50 tokens/s baseline of today’s API tiers. No new model, no new benchmark. Just speed. And that’s the point.
Why 750 tokens/s changes the game for agents
Frontier models have always forced a trade: smart but slow, or fast but dumb. Sol is OpenAI’s strongest model — top-tier on coding, reasoning, and agentic tasks — and latency was the tax you paid for it. At 750 tokens/s, a 20-step agent loop that took minutes finishes in seconds. Real-time voice with frontier-level reasoning stops being a demo. Coding agents iterate faster than you can review.
The deal behind it is reportedly a $20 billion cloud agreement between OpenAI and Cerebras. OpenAI is betting that inference speed, not just raw intelligence, is the next competitive axis.
How to get access
Through the OpenAI API and Codex — but GPT-5.6 is still in limited preview for trusted partners, and Cerebras capacity rolls out to select customers first. Sol pricing sits at $5 input / $30 output per 1M tokens. If you’re building agent loops or real-time voice apps, this is the access worth waiting for.
You Might Also Like
- Openai gpt 5 6 sol Terra Luna a Three Tier Lineup Only 20 Orgs can Touch
- Openai Codex Pets Turn Your ai Coding Agent Into a Desktop Tamagotchi
- Openai Codex in Chrome Moves the Coding Agent Into Your Real Browser Session
- Cloudrouter Gives Your ai Coding Agent its own Cloud Machine and Thats a big Deal
- Gpt oss 120b Openai Finally Goes Open Source and its Worth the Wait

Leave a comment