Remember that GPT-4o voice demo? Camera on, talking naturally, AI responding in real time. Impressive — except it runs on OpenAI’s servers, costs money per minute, and every frame of your face goes to the cloud.
Parlor does the same thing on an M3 Pro. Entirely local. 266 points on Hacker News in a day.
One Model Does Three Jobs
The trick is Gemma 4 E2B. Google’s 2-billion parameter edge model handles speech recognition, visual understanding, and language generation in a single pass — work that used to require three separate models stitched together. Kokoro-82M handles text-to-speech via MLX on Mac. A FastAPI server ties everything together over WebSocket, streaming audio PCM and JPEG frames between your browser and the models.
Total download: about 2.6 GB. End-to-end latency on an M3 Pro: 2.5–3 seconds. Decoding runs at 83 tokens per second on GPU. Not instant, but fast enough that conversations feel natural.
No push-to-talk. You just talk, and Parlor listens through voice activity detection. You can interrupt mid-sentence — it stops and responds to whatever you just said. Hands-free, like an actual conversation.
Why It Blew Up This Week
Timing. Gemma 4 launched April 2 under Apache 2.0. Parlor showed up on Show HN three days later as one of the first real demos proving E2B actually works on consumer hardware. The r/LocalLLaMA crowd and privacy-first developers have been waiting for exactly this: a multimodal model small enough to run on a laptop that actually does something useful.
The backstory is interesting too. The creator built Parlor to eliminate server costs for Bule AI, a free language-learning platform. Self-hosting an RTX 5090 at home wasn’t scaling. A 2.6 GB local model that runs on every Mac? That scales.
Cloud Voice Assistants Have a Problem
OpenAI’s voice mode, Gemini Live, Apple Intelligence — all cloud-dependent, all metered, all sending your data somewhere else. Pipecat is open-source but still relies on cloud LLMs for inference. Home Assistant has voice but no vision.
Parlor is the only project shipping real-time voice + vision + voice output in a single local package. It’s a research preview with rough edges — but 777 GitHub stars in 48 hours says something about demand for AI that doesn’t phone home.
You Might Also Like
- Claude Channels Scores 375 Points on Hacker News Anthropics Play to Replace Openclaw
- George Hotz Wants to Sell you a 12000 ai Supercomputer and 221 Hacker News Comments Cant Stop Arguing About it
- Ensu got 328 Points on Hacker News the Privacy Crowd Wants ai That Never Phones Home
- Google A2ui Agent to User Interface Finally a Standard way for ai Agents to Show you Things
- Mcp2cli the Tool That Cuts mcp Token Costs by 99 Just hit Hacker News

Leave a comment