OpenAI shipped GPT-5.5 on April 23, 2026, with a Pro variant a day later. Biggest model bump in six weeks, and the benchmarks aren’t subtle: 84.9% on GDPval, 78.7% on OSWorld-Verified, 98.0% on Tau2-bench Telecom (no prompt tuning), 82.7% on Terminal-Bench 2.0, 51.7% on FrontierMath 1-3. Translation: best-in-class coding, computer use, and agent workflows — past both Opus 4.7 and Gemini 3.1 Pro on the lines that matter for product work.
What it actually is
A frontier text model, not a new product surface. Available through the OpenAI API and inside Codex. 1M-token context via API, 400K in Codex. Per-token pricing is higher than GPT-5.4, but it burns fewer tokens on the same task, so total bills often come out flat or lower.
Why developers should care
The OSWorld and Terminal-Bench numbers are the real story. Computer-use agents that actually finish multi-step browser and shell tasks were the soft spot of GPT-5.4; they’re now the strongest pitch. If you’re building a coding agent, a deep-research agent, or anything that drives a real OS, this is the default model to swap in and benchmark. Same SDK, no migration work — just change the model string.
You Might Also Like
- Cursor Composer 2 Takes on Anthropic and Openai With a 0 50 m Token Coding Model and the Benchmarks Back it up
- Hiveterm Bets on the Multi Agent Workspace Claude Codex and Gemini in one Terminal
- Kimi k2 6 Beats gpt 5 4 and Claude Opus 4 6 on swe Bench pro
- Openai Trusted Access for Cyber Opens gpt 5 5 to Offensive Security Work for Verified Defenders Only
- Gemini 3 1 pro Just Dropped and the Benchmarks are Wild

Leave a comment