AI Voice & Audio
-
LocalClicky runs a voice assistant entirely on your Mac with no cloud and no API keys
LocalClicky is a Mac menubar app that lets you talk to your computer and have it actually do things — with nothing leaving the machine. Say “Computer” to start a session, chain commands back to back, and say “goodbye” when you’re done. ## Fully local by design The pitch is a privacy inversion. Most voice… Continue reading
-
Sun is a voice API built for rooms with more than one human and more than one agent
Every realtime voice API today — OpenAI Realtime, Gemini Live, Hume — assumes one person talking to one AI. Sun, a new Product Hunt launch, is built for the case that breaks: a room with multiple humans and multiple agents all sharing the same audio channel. ## What it actually does Sun is a voice-first… Continue reading
-
MisoTTS is an 8B open-weights voice model built to out-emote humans
## What it is MisoTTS is an 8-billion-parameter text-to-speech model from Miso Labs, released with open weights and a claim of being the most emotive voice model around. It generates expressive speech from text plus audio context, using residual vector quantization to widen its sonic range, and it clones a voice from a short sample… Continue reading
-
Voiser AI hits 140 languages with 3,000 voices in one TTS + voice cloning + AI video platform
Voiser AI shipped a unified voice platform — text-to-speech, voice cloning, speech-to-text, and AI video generation in one product, covering 140+ languages and 3,000+ voice options. Launched on Product Hunt this week. ## What’s actually in the box 3,000 voices spanning male, female, and child variants across multiple accents and emotional styles. Custom voice instruction… Continue reading
-
Supertonic v3: 99M-parameter on-device TTS covers 31 languages with expression tags
Supertone Inc shipped Supertonic v3 — 99M parameters, 31 languages, running entirely on-device via ONNX Runtime with zero cloud calls. GitHub trending lit up this week with 745+ daily stars as the broader dev community discovered the release. ## The size argument At 99M parameters Supertonic v3 is roughly 7-20x smaller than competing open TTS… Continue reading
-
Amazon Alexa for Shopping replaces Rufus — and it’ll buy from non-Amazon sites for you
Amazon retired Rufus on May 13. The 300M users who tested Rufus inside the search bar now get Alexa for Shopping instead — the same shopping agent, fused with Alexa+ and rebranded under one name. What it actually does It’s a conversational shopping agent that lives in three places: the Amazon app, amazon.com, and Echo… Continue reading
-
OpenAI GPT-Realtime-2 + Translate + Whisper: three voice models, one API, several startups erased
OpenAI shipped three Realtime API models on May 7. Read the spec sheet and you can hear a half-dozen voice startups quietly rewriting their decks. What actually launched GPT-Realtime-2 is the first voice model with GPT-5-class reasoning baked in. 128K context (up from 32K), five-level reasoning effort, tone control, parallel tool calls, clean recovery from… Continue reading
-
OpenAI Realtime Voice WebRTC Stack: the infra blueprint every voice agent startup now has to compete with
OpenAI dropped an engineering deep dive May 4 on how it serves real-time voice to 900M+ weekly users. It hit Hacker News front page with 324 points — the first time OpenAI has formally walked through the architecture behind ChatGPT Voice and the Realtime API. What they rebuilt They rewrote the WebRTC stack from scratch.… Continue reading
-
Sakana AI KAME hits 6.43 on MT-Bench by giving voice models two brains
Sakana AI just open-sourced KAME, a tandem speech-to-speech architecture that splits voice AI in two: a fast S2S model handles the mouth, a slow LLM handles the brain. The S2S responds instantly while the LLM reasons in the background and injects “oracle” signals as they arrive. The model talks while it’s still thinking. Why the… Continue reading
-
AssemblyAI’s Voice Agent API undercuts Vapi and Retell with 307ms STT latency
AssemblyAI just put a flag in the voice agent ground. Their new Voice Agent API is one endpoint that takes speech in and gives speech out — STT, dialogue orchestration, the whole pipeline managed. It’s the same job Vapi, Retell, and LiveKit Agents have been splitting up between them, but this one comes from the… Continue reading
