Remember when OpenAI demo’d GPT-4o voice mode and everyone lost their minds? Camera on, voice flowing, AI responding in real time. Cool — except it runs on OpenAI’s servers, costs money, and sends your data to the cloud.
Parlor does the same thing on your MacBook. Entirely local.
What Parlor Actually Does
You open a browser tab, grant mic and camera access, and start talking. Parlor sees what your camera sees, hears what you say, and talks back — all processed on your machine. No push-to-talk button. No waiting for a full sentence before processing. You can even interrupt the AI mid-sentence, just like a real conversation.
The stack is surprisingly lean. Google’s Gemma 4 E2B handles both speech recognition and visual understanding — that’s a single 2-billion parameter model doing the work of what used to require three separate models. Kokoro handles text-to-speech via MLX on Mac. A FastAPI server ties everything together over WebSocket, feeding audio PCM and JPEG frames from your browser to the models. Total download: about 2.6 GB. Runs in real time on an M3 Pro.
Why This Matters Right Now
Gemma 4 E2B launched under Apache 2.0 in early 2026, and Parlor is one of the first real applications proving what this model can do on consumer hardware. Google designed E2B specifically for edge — native audio input, native vision, under 1.5 GB memory footprint with quantization. But specs on a model card don’t move people. A working demo where you talk to your laptop and it talks back? That moves people.
265 points on Show HN. 31 comments. Top 6 on bestofshowhn.com’s April rankings. The local-AI and privacy-first communities ate this up.
How It Compares
OpenAI’s voice mode and Google’s Gemini Live are the obvious benchmarks — but both are cloud-only, proprietary, and metered. Pipecat is an open-source framework for multimodal conversational AI, but it still depends on cloud LLMs for the heavy lifting. Home Assistant has voice, but no vision.
Parlor is the only project right now shipping real-time voice + vision + voice output in a single local package. It’s labeled a “research preview” and the rough edges are real — but as a proof of concept for what edge multimodal AI looks like in 2026, it’s hard to beat.
You Might Also Like
- Google A2ui Agent to User Interface Finally a Standard way for ai Agents to Show you Things
- Notebooklm py the Unofficial Python sdk That Finally Gives Google Notebooklm a Real api
- Kani tts 2 Just Dropped and it Only Needs 3gb of Vram to Clone Your Voice
- Google Deepmind Aletheia Just Solved Math Problems Nobody Could Heres why That Matters
- Google Workspace Studio Just Made ai Agents a Thing Everyone can Build

Leave a comment