Lightning V3 (Smallest.ai) Scores 3.89 MOS and Beats OpenAI, ElevenLabs — From a 16-Person Team in Pune

The voice AI race right now looks like a bar fight in a crowded room. ElevenLabs has the brand. OpenAI has the distribution. Cartesia has the latency story. Microsoft just shipped MAI-Voice-1. Mistral open-sourced Voxtral TTS. And somehow, a 16-person startup from Pune, India, with $8 million in total funding, just posted the highest conversational voice quality score among all of them.

Smallest.ai launched Lightning V3 on March 27, and it landed at number 2 on Product Hunt on April 2 with 334 upvotes. Tribune India, The Wire, Editorji, and BusinessReviewLive all ran coverage. The attention isn’t coming from hype — it’s coming from a very specific set of numbers that are hard to ignore if you’re building voice agents in production.

Why 3.89 MOS Matters More Than You Think

MOS — Mean Opinion Score — is the standard way to measure how natural a synthetic voice sounds to human listeners. It’s scored on a 1-to-5 scale, and in conversational contexts, anything above 3.5 is considered good. Most production TTS models cluster between 3.2 and 3.7 in real conversational settings, not the cherry-picked studio-quality demos you see on landing pages.

Lightning V3 scored 3.89 MOS in conversational evaluations. That’s not a single-utterance benchmark where the model gets a clean sentence and plenty of context. This is multi-turn, chunk-based, streaming generation — the kind of output a voice agent actually produces when talking to a real person. OpenAI, Cartesia, and ElevenLabs all scored lower on the same evaluation.

The breakdown gets more interesting when you look at the sub-scores. Intonation: 3.33 out of 5. Prosody: 3.07 out of 5. These measure whether the voice rises and falls naturally, whether it pauses in the right places, whether it sounds like it’s actually thinking before it speaks rather than reading off a teleprompter. In a head-to-head listener preference test, Lightning V3 was preferred over OpenAI’s GPT-4o-mini-TTS 76.2% of the time.

Smallest.ai also publishes a broader benchmark where their overall platform scores 4.14 MOS versus ElevenLabs’ 3.83. That gap — 0.31 points — sounds small until you realize how compressed the top of the MOS scale is. Going from 3.8 to 4.1 in perceived naturalness is the difference between “clearly AI” and “wait, is that a person?”

100ms Latency at 20 Concurrent Requests

Here’s where Lightning V3 separates from the pack in a way that benchmarks alone can’t capture. The model is built specifically for streaming, chunked speech generation — the exact pattern that production voice agents use.

When a voice agent responds to a user, it doesn’t wait for the full response to be generated before speaking. It streams audio chunks as they’re produced. The challenge is maintaining voice consistency across chunks, keeping prosody natural when the model doesn’t know what the next sentence will be, and doing all of this fast enough that the user doesn’t notice any gap.

Lightning V3 delivers first-audio latency under 100 milliseconds at 20 concurrent requests. For context, Cartesia’s Sonic 3 claims 40-90ms latency, but that number gets fuzzy under load. OpenAI doesn’t even publish official latency specifications for its TTS API — a telling omission for anyone building latency-sensitive applications. Microsoft’s MAI-Voice-1 generates 60 seconds of audio in under one second, but that’s a different architecture — batch generation, not streaming.

The 100ms number matters because it’s the threshold for human conversational perception. Below 100ms, a voice agent’s response feels instantaneous — like talking to a person who just happens to think really fast. Above 200ms, users start noticing the gap. Above 500ms, the conversation feels broken. Every voice AI company knows this, but very few publish latency numbers under concurrent load. Smallest.ai does, and that’s either confidence or recklessness. Given the benchmark results, it looks like confidence.

15 Languages and Mid-Sentence Switching

Lightning V3 supports 15 languages out of the box: English, Spanish, French, Italian, Dutch, Swedish, Portuguese, German, Hindi, Tamil, Kannada, Telugu, Malayalam, Marathi, and Gujarati. That’s not an ElevenLabs-level language count — ElevenLabs V3 supports over 70 — but look at the selection more carefully.

Six of those fifteen languages are Indian regional languages. Tamil, Kannada, Telugu, Malayalam, Marathi, Gujarati — these are the languages spoken by hundreds of millions of people who are currently underserved by every major TTS provider. ElevenLabs’ 70-language coverage skews heavily toward European languages. OpenAI’s voice capabilities are strongest in English. Smallest.ai built its language support around where voice agents are actually being deployed at scale right now: India’s booming call center automation and customer service market.

The mid-sentence language switching is the feature that turns heads in demos. A single voice can start a sentence in English and finish it in Hindi — or any combination of the supported languages — without any prompt engineering or API gymnastics. The model handles automatic language detection on the fly. For anyone who’s tried to build a multilingual voice agent using separate TTS models per language, this is the difference between a weekend project and a six-month integration nightmare.

This matters commercially because India’s voice agent deployment is scaling faster than anywhere else. Customer service automation, healthcare triage, banking support, government helplines — these are high-volume, multilingual use cases where a caller might switch between English, Hindi, and Tamil within a single interaction. Building that with ElevenLabs means chaining multiple API calls, handling language detection separately, and praying the voice stays consistent. Lightning V3 handles it natively in a single stream.

Voice cloning works from 5 to 15 seconds of reference audio. Feed the model a short clip of someone speaking, and it generates new speech in that voice across all 15 supported languages. The cloned voice maintains consistency across conversation turns and language switches. This isn’t unique — voice cloning has become table stakes — but doing it reliably in streaming mode across 15 languages with sub-100ms latency is a different engineering challenge entirely.

The Voice Agent Infrastructure War

The real story behind Lightning V3 isn’t the model. It’s the market timing.

Voice agents are the fastest-growing category in AI infrastructure right now. The global voice AI market crossed $22 billion in 2026 and is projected to hit $47.5 billion by 2034. Every major tech company is stacking capabilities: Microsoft shipped MAI-Voice-1 and VibeVoice. Mistral open-sourced Voxtral TTS with a 68.4% win rate over ElevenLabs Flash at 73% lower cost. ElevenLabs dropped its conversational AI pricing to $0.10 per minute. Cartesia is pitching itself as the low-latency infrastructure layer.

Into this bloodbath walks Smallest.ai — founded in 2023 by Akshat Mandloi and Sudarshan Kamath, seed round led by Sierra Ventures with participation from 3one4 Capital and Better Capital, 16 employees, operating out of Pune. They recently partnered with AI Grants India to bring voice AI tools to grassroots builders across the country. On paper, they shouldn’t be competitive with companies that have 10x to 100x their resources.

But here’s the thing about TTS models: they don’t follow the same scaling laws as large language models. You don’t need 10,000 H100s and a billion-dollar training budget to build a world-class voice model. The quality is determined more by architecture choices, training data curation, and how well the model handles the specific constraints of streaming generation than by raw compute. This is one of the few AI categories where a small, focused team can genuinely compete with Big Tech on output quality.

Smallest.ai’s positioning is deliberate. They’re not trying to be the everything-voice-platform that ElevenLabs is becoming. They’re not chasing 70 languages or audio tags or dialogue mode. They’re building the fastest, most natural-sounding TTS layer for production voice agents — specifically optimized for the streaming, chunked, multi-turn pattern that every voice agent framework uses. It’s a narrow bet, but it’s the bet that matters most as the industry shifts from “voice as a feature” to “voice as the primary interface.”

The pay-as-you-go pricing with no seat licenses, no minimum usage, and no upfront commitments is the other quiet advantage. When you’re a startup building a voice agent and you’re choosing between ElevenLabs’ tiered plans and Smallest.ai’s pure usage-based model, the calculus is simple. You pay for what you use. You scale when you’re ready. No procurement calls.

The Product Hunt performance — 334 upvotes, number 2 on April 2 — suggests the developer community is picking up on this. The questions in the Product Hunt comments were notably technical: people asking about regional accent support, emotion control, performance under production load. These aren’t curiosity clicks. These are developers evaluating a production dependency.

Whether Smallest.ai can sustain this against competitors with orders of magnitude more funding is the open question. ElevenLabs reportedly raised over $200 million and has 70+ language support. OpenAI can bundle TTS into every ChatGPT interaction. Microsoft has Azure’s distribution machine. Smallest.ai has none of that.

But right now, in April 2026, they have the highest MOS score in conversational evaluation, sub-100ms streaming latency, the best Indian language support in the market, and a team small enough to ship fast without committee meetings. The voice agent stack is still being assembled — and the TTS layer is the one component where quality perception directly determines whether users trust the agent or hang up. Smallest.ai is betting that winning on voice quality and production reliability matters more than winning on feature count. Given how the voice agent space is evolving, that bet looks increasingly smart.

Top AI Product

Leave a comment Cancel reply

Lightning V3 (Smallest.ai) Scores 3.89 MOS and Beats OpenAI, ElevenLabs — From a 16-Person Team in Pune

You Might Also Like

Share this:

Discover more from Top AI Product

Leave a comment Cancel reply