Sakana AI just open-sourced KAME, a tandem speech-to-speech architecture that splits voice AI in two: a fast S2S model handles the mouth, a slow LLM handles the brain. The S2S responds instantly while the LLM reasons in the background and injects “oracle” signals as they arrive. The model talks while it’s still thinking.
Why the numbers matter
Voice models have lived with one trade-off for years: fast and dumb, or smart and laggy. KAME breaks it. MT-Bench jumped from 2.05 to 6.43 — roughly 3x — with latency still near zero. The front-end keeps Moshi’s 80ms audio token cycle, so responses start before you finish your sentence. The training trick is Simulated Oracle Augmentation: a simulator LLM generates 56,582 synthetic dialogues across six hint levels, teaching the front-end how to integrate partial reasoning at any moment.
The hot-swap part is the real headline
The backend LLM is fully replaceable. Train once with gpt-4.1-nano, deploy with Claude Opus 4.1 or Gemini 2.5 Flash without retraining a single parameter. KAME is open source on Hugging Face as SakanaAI/kame — self-host the front-end, point it at any LLM API. Customer service bots, voice tutors, real-time translators all become viable now that “smart but slow” stopped being the only option for voice agents.
You Might Also Like
- Kimi k2 6 Beats gpt 5 4 and Claude Opus 4 6 on swe Bench pro
- Gpt 5 5 Takes Back the Coding Crown From Claude Opus 4 7
- Ggml Llama cpp Joins Hugging Face and Honestly it was Only a Matter of Time
- Sakana ai doc to Lora Text to Lora Your llm Just got a Permanent Memory Upgrade
- Gemini 3 1 Flash Lite Googles Cheapest Model Just got Surprisingly Good

Leave a comment