There’s a certain thrill when you find an open-source project that makes you rethink what’s possible on cheap hardware. [Moonshine Open-Weights STT](https://github.com/moonshine-ai/moonshine) is exactly that kind of project. Built by [Useful Sensors](https://usefulsensors.com/) — the company led by Pete Warden, former TensorFlow lead at Google — Moonshine is a family of speech-to-text models designed to run locally on just about anything: phones, Raspberry Pis, IoT gadgets, even wearables.
The numbers are hard to ignore. At the top end, Moonshine claims higher accuracy than Whisper Large V3, and at the bottom end you’re looking at a 26 MB model that still holds its own. For context, Whisper Large V3 is around 1.5 GB. That’s not a typo. The trick is a variable-length encoder that scales computation to the actual length of your audio input instead of padding everything out to 30-second chunks like Whisper does. The result is roughly a 5x reduction in compute compared to Whisper Tiny with no increase in word error rate, and a 1.7x overall speed boost across the board.
What caught my attention is that Moonshine just [showed up on Hacker News](https://news.ycombinator.com/item?id=47143755) as a Show HN post and racked up 269 points with 59 comments in a single day — the top-scoring AI Show HN that day. The discussion was genuinely useful, with people sharing benchmarks and comparing it against other local STT options.
The latest version, [Moonshine v2](https://arxiv.org/abs/2602.12241), introduces a streaming encoder with sliding-window attention, which means you get bounded latency regardless of how long the audio clip runs. That’s a big deal for live transcription and voice command use cases where waiting for the full utterance to finish isn’t acceptable.
Models are available on [Hugging Face](https://huggingface.co/UsefulSensors/moonshine) and run via Keras with Torch, TensorFlow, or JAX backends, plus there’s ONNX runtime support for edge devices. Platform coverage is broad — Python, iOS, Android, macOS, Linux, Windows, all supported. If you’ve been looking for a private, offline-capable speech recognition solution that doesn’t require a beefy GPU, Moonshine is worth a serious look.

Leave a comment