AI Voice & Audio
-
AgentPhone gives every AI agent its own phone number for calls and texts
AI agents are getting good at acting online, but the phone network is still mostly closed to them. AgentPhone, a Y Combinator Spring 2026 startup, fixes that by giving every AI agent its own phone number for both voice and messaging through a single API. ## What AgentPhone does Developers provision a number in seconds,… Continue reading
-
Microsoft MAI-Voice-2 brings cloning and emotional speech to Azure Copilot in 15 languages
Microsoft is leaning harder on its own models, and voice is a clear example. At Build 2026 on June 2, the company introduced MAI-Voice-2, the second generation of its in-house text-to-speech model, built to make speech a native interface for Azure Copilot. ## What MAI-Voice-2 does The model delivers expressive speech synthesis across 15 languages,… Continue reading
-
ElevenLabs adds Avatars to ElevenCreative for AI talking-head videos with built-in lip-sync
ElevenLabs built its name on AI voices; now it is putting a face to them. The company has added Avatars to ElevenCreative, letting you generate talking-head videos that combine an AI voice with a customizable on-screen presenter in a single workflow. ## How Avatars works An avatar is a persistent visual identity you build from… Continue reading
-
KugelAudio is a self-hostable text-to-speech model with 39ms time-to-first-audio
Voice agents in regulated industries face a problem most TTS vendors ignore: you cannot ship customer data to a third-party cloud. KugelAudio, built by a four-person Berlin team and accepted into Y Combinator’s Spring 2026 batch, is a real-time text-to-speech model designed to run on your own infrastructure. ## Low latency, self-hosted KugelAudio reports a… Continue reading
-
xAI Grok Voice Agent API builds voice assistants that speak 100+ languages at $0.05 a minute
xAI has opened up the voice stack behind Grok. The Grok Voice Agent API lets developers build real-time voice assistants that speak dozens of languages, call tools, and pull in live data — priced at $0.05 per audio minute. ## What the API does The headline is multilingual range: support for over 100 languages, including… Continue reading
-
Asmi AI Makes the Real Phone Calls You Have Been Putting Off
Plenty of personal admin still happens over the phone — booking an appointment, chasing a refund, calling a restaurant. Asmi AI is an agent built for exactly that: it handles your personal chores in the physical world by making real phone calls on your behalf. ## What Asmi AI does Rather than living inside a… Continue reading
-
Krisp Voice Translation API Brings Real-Time Speech-to-Speech to Developers
Krisp, best known for AI noise cancellation, has opened its enterprise voice translation engine to developers. The new Krisp Voice Translation API — launched alongside Voice Translation v3 — does real-time, bidirectional speech-to-speech translation across 60+ languages, the same engine that hit 96% accuracy in a live healthcare deployment. ## What the Krisp Voice Translation… Continue reading
-
dots.tts Is a New Open-Source TTS Model With No Discrete Tokens
Most text-to-speech systems convert audio into discrete tokens at some stage of the pipeline. dots.tts, a new open-source TTS model from RedNote’s Hilab team, throws that step out entirely. It’s a 2B-parameter, fully continuous, end-to-end autoregressive system — a semantic encoder, an LLM, and a flow-matching acoustic head running over a 48kHz AudioVAE, with no… Continue reading
-
Vaani Dubs Video Into 40+ Languages With Lip-Sync
Dubbing a video into another language usually means a studio, a slow turnaround, and a result where the mouth never matches the words. Vaani, an AI dubbing platform built for creators, broadcasters, and studios, aims to make that output broadcast-ready rather than just “done.” It translates and re-voices video into 40+ languages with cloned voices… Continue reading
-
Mina Meeting Assistant Talks and Works During Your Calls
Most AI meeting tools sit quietly and hand you notes afterward. Mina Meeting Assistant, which launched its 1.0 at the top of Product Hunt, is built to do the opposite — it participates while the call is still happening. Mina can speak during a meeting, pull context from your other tools, generate outputs, and push… Continue reading
