Mistral just shipped Medium 3.5, a 128B dense model that hits 77.6% on SWE-Bench Verified. For context: Gemini 3.1 Pro Preview leads the board at 78.8%. An open-weight model from a European lab is now within rounding distance of Google’s flagship on real coding work.
It’s a single set of weights doing instruction-following, reasoning, and agentic coding. Reasoning mode is a per-request toggle — quick chat or long-horizon agent run, same model, you decide how much test-time compute to burn. 256K context window. 91.4% on τ³-Telecom, the agentic benchmark Mistral cares about most.
What the API does
$1.50 per million input tokens, $7.50 per million output. Available through Mistral’s API, Le Chat, NVIDIA NIM containers, and Vibe — Mistral’s coding agent, where Medium 3.5 just replaced Devstral 2 to power remote agents. Native function calling and JSON output ship in the box. Typical use cases: long-horizon coding agents, multi-step tool use, RAG over large codebases.
Why it matters
Open weights under a modified MIT license. Self-hostable. Most models scoring near Gemini ship as closed APIs — Mistral is the rare lab handing you the actual weights at this performance tier.
You Might Also Like
- Phi 4 Reasoning Vision 15b Microsofts 15b Model Just Embarrassed gpt 4o on Vision Tasks
- Claude Code Remote Control Just Turned my Phone Into a Coding Terminal and im Weirdly Into it
- Nvidia Nemotron 3 Super 120b Parameters 12b Active the Math Behind the Fastest Open Source Reasoning Model
- 27k Github Stars in Weeks Learn Claude Code by Shareai lab Breaks Down ai Coding Agents Into 12 Lessons
- Cursor Composer 2 Takes on Anthropic and Openai With a 0 50 m Token Coding Model and the Benchmarks Back it up

Leave a comment