xAI has opened up the voice stack behind Grok. The Grok Voice Agent API lets developers build real-time voice assistants that speak dozens of languages, call tools, and pull in live data — priced at $0.05 per audio minute.
## What the API does
The headline is multilingual range: support for over 100 languages, including native-level handling of Mandarin with regional accents and idioms. It detects the user’s language in real time and switches its response language automatically, with no developer configuration. xAI says it ranks first on the Big Bench Audio benchmark and delivers an average time-to-first-audio under one second, which it claims is about five times faster than the closest competitor.
## Building on it
Developers extend the API by connecting custom functions, so a voice agent can take actions rather than just talk. It also taps xAI’s integrated real-time search, pulling from the broader web and the live X data stream so answers stay current. Tesla already uses Grok for in-car voice controls, reaching navigation, route planning, and vehicle insights through privileged APIs — a preview of where tool-calling voice agents are headed.

Leave a comment