So [nineninesix.ai](https://www.nineninesix.ai/) quietly released Kani-TTS-2 today, and honestly, I think this one deserves way more attention than it’s getting. It’s a 400M parameter open-source text-to-speech model that runs on as little as 3GB of VRAM. That means your old RTX 3060 sitting in a drawer? Yeah, that works. No cloud GPU rental needed, no waiting in API queues — just install it and go.
The model is built on [LiquidAI’s LFM2 architecture](https://huggingface.co/nineninesix/kani-tts-2-en) with Nvidia’s NanoCodec handling the audio side, and the results are surprisingly good for something this small. It hits a real-time factor of about 0.2, which means generating 10 seconds of speech takes roughly 2 seconds. For local inference on consumer hardware, that’s wild. The whole thing was trained on around 10,000 hours of speech data using 8 H100 GPUs — and the training itself only took 6 hours. The efficiency here is kind of absurd.
But the feature that really caught my eye is zero-shot voice cloning. You feed it a short reference audio clip, and it can synthesize new speech in that voice without any fine-tuning. The [API is dead simple](https://github.com/nineninesix-ai/kani-tts) — a few lines of Python and you’re cloning voices locally. It currently supports English and Portuguese, with more languages likely on the way given the team’s track record with the original KaniTTS multilingual models.
The project is already making rounds on [MarkTechPost](https://www.marktechpost.com/2026/02/15/meet-kani-tts-2-a-400m-param-open-source-text-to-speech-model-that-runs-in-3gb-vram-with-voice-cloning-support/), [Hugging Face](https://huggingface.co/nineninesix/kani-tts-2-en), and [Hacker News](https://news.ycombinator.com/item?id=45440904), and for good reason. The combination of a tiny hardware footprint, solid generation speed, and voice cloning — all under an Apache 2.0 license — makes this genuinely accessible in a way most TTS models aren’t. You can grab the [pretraining code](https://github.com/nineninesix-ai/kani-tts-2-pretrain) too if you want to train your own variant. Or just try the [online demo at kanitts.com](https://kanitts.com/) before committing to a local setup.
I’ve been following open-source TTS for a while now, and the gap between these community models and the big commercial APIs is shrinking fast. Kani-TTS-2 isn’t perfect — it can struggle with longer inputs past 40 seconds, and it’s still early days for non-English support — but at this price point (free) and this hardware requirement (basically anything), it’s hard to complain.
Leave a comment