OpenAI dropped an engineering deep dive May 4 on how it serves real-time voice to 900M+ weekly users. It hit Hacker News front page with 324 points — the first time OpenAI has formally walked through the architecture behind ChatGPT Voice and the Realtime API.
What they rebuilt
They rewrote the WebRTC stack from scratch. A single Go service on Pion handles both signaling and media termination, replacing the one-port-per-session SFU pattern that breaks at this scale. ICE and DTLS state ownership got redesigned, media is routed globally — cross-continent turn-taking doesn’t add awkward pauses, barge-in works mid-sentence, and the model can reason while you’re still talking.
What developers actually get
The same stack powers the Realtime API’s WebRTC endpoint. Hit it and you get the low-latency path ChatGPT Voice runs on — built for voice agents, live tutors, call-center automation where turn-taking detection and barge-in are the whole product.
For voice agent startups — Vapi, Retell, LiveKit, Pipecat — this is the new bar. OpenAI just spelled out what infra you’re competing with for low-latency turn-taking at planetary scale. Most teams will route through the Realtime API instead of fighting it.
You Might Also Like
- Openai Oauth Turns Your Chatgpt Subscription Into a Free Openai api but Should you use it
- Assemblyais Voice Agent api Undercuts Vapi and Retell With 307ms stt Latency
- Openai Codex Pets Turn Your ai Coding Agent Into a Desktop Tamagotchi
- Shuo sub 500ms Voice Agent 600 Lines of Python That Make Voice ai Feel Instant
- Chatgpt Interactive Visuals Just Dropped Openai Wants 140 Million Weekly Learners to Ditch Static Explanations

Leave a comment