Not every job in an AI pipeline needs a frontier model. JetBrains is betting on the opposite end with Mellum2, an open Mixture-of-Experts model released in early June and built to be fast and cheap at the high-frequency tasks that sit inside bigger systems.
## What Mellum2 is
Mellum2 has 12B total parameters but activates only about 2.5B per token, and it’s trained specifically on natural language and code. Apache 2.0 licensed, it’s pitched as a “focal” model — narrow and well-scoped rather than general — that cuts inference time to under half of comparable models while staying competitive on quality. JetBrains, the company behind IntelliJ and the original Mellum, knows the software-engineering workload it’s targeting.
## A “focal” model
The use cases are the tell: routing and orchestrating AI workloads, powering low-latency RAG pipelines, running fast sub-agents inside complex flows, and enabling private local deployment. It’s not trying to out-reason Claude or GPT; it’s trying to be the cheap, quick component you call thousands of times. As multi-model pipelines become normal, small specialized models like Mellum2 are how teams keep latency and cost under control.

Leave a comment