Foundation Models & LLM Research
-
Cohere Transcribe Tops the Open ASR Leaderboard With a 5.42% Word Error Rate — and It’s Fully Open Source
Speech recognition has been one of those AI fields where open-source models consistently trailed behind proprietary offerings. OpenAI’s Whisper changed the game in 2022, but even Whisper Large v3 couldn’t match the accuracy of closed-source APIs from the likes of Google and Deepgram. That gap just narrowed significantly. Cohere dropped Transcribe on March 26, 2026… Continue reading
-
Mistral Voxtral TTS scores 63% listener preference over ElevenLabs — and the weights are free
One day after ElevenLabs locked in a partnership with IBM to power enterprise voice agents through watsonx Orchestrate, Mistral dropped the opposite play: a frontier-quality text-to-speech model with full open weights under Apache 2.0. No API lock-in, no per-character fees if you self-host, and a footprint small enough to run on a phone. Voxtral TTS… Continue reading
-
Sora Is Dead. LTX 2.3 (Lightricks) Ships 22B Open-Source Video + Audio in a Single Forward Pass.
The timing is almost poetic. On March 24, OpenAI announced it’s killing Sora — the app, the API, and the billion-dollar Disney partnership that was supposed to define AI video. One day later, Lightricks drops LTX 2.3: a 22-billion-parameter open-source model that generates synchronized video and audio in a single forward pass, at up to… Continue reading
-
ARC-AGI-3 Turns AI Testing Into a Video Game — And Every Frontier Model Is Losing
For seven years, the ARC benchmark has been the one test that AI couldn’t brute-force its way through. While GPT-series models saturated MMLU and climbed SWE-bench leaderboards, ARC remained stubbornly unsolved — a set of abstract puzzles designed to measure genuine reasoning rather than pattern recall. Now, the ARC Prize Foundation has thrown out the… Continue reading
-
AI2’s MolmoWeb Outscores GPT-4o on Web Tasks — With Just 8 Billion Parameters
The web agent race has a new open-source contender, and the benchmarks are hard to ignore. On March 24, the Allen Institute for AI (AI2) released MolmoWeb, a fully open-source visual web agent that navigates browsers by looking at screenshots — the same way a human would. The kicker: its 8B-parameter model outperforms agents built… Continue reading
-
Hypura Runs a 31GB Model on a 32GB Mac at 2.2 tok/s — llama.cpp Just OOMs
There’s a frustrating ceiling that every Apple Silicon user running local LLMs hits eventually: your model is slightly too big for your RAM, and everything falls apart. llama.cpp crashes. MLX refuses to load it. The OS starts swapping so aggressively that your entire machine grinds to a halt. You either buy a more expensive Mac… Continue reading
-
Google TurboQuant Squeezes LLM Cache to 3 Bits — 6x Less Memory, 8x Faster, Zero Accuracy Loss
Every large language model running today has the same dirty secret: the longer the conversation goes, the more memory the Key-Value cache eats. For models like Gemini handling 100k+ token contexts, the KV cache can balloon to consume more memory than the model weights themselves. Google Research just published a direct answer to this problem.… Continue reading
-
SentrySearch Turns Hours of Video Into a Searchable Index for $2.50 — Using Google’s New Multimodal Embeddings
Searching through video footage has always been painful. Whether it’s dashcam recordings, security cameras, or raw production clips, the standard approach involves either scrubbing through hours of footage manually or relying on transcription-based pipelines that miss everything visual. SentrySearch, an open-source CLI tool that appeared on Hacker News this week, takes a fundamentally different approach:… Continue reading
-
Hermes 4 (Nous Research) Scores 96.3% on MATH-500 — and Refuses Almost Nothing
Most open-source models pick a lane: either they chase benchmark scores, or they minimize content restrictions. Nous Research is betting it can do both at the same time. Hermes 4, their latest open-weight model family spanning 14B to 405B parameters, posts competitive math and reasoning scores while achieving the lowest refusal rate of any high-performance… Continue reading
-
A Single API String Exposed Cursor’s Secret: Composer 2 Runs on Moonshot AI’s Kimi K2.5
On March 19, a $29.3 billion coding startup launched what it called a breakthrough proprietary model. Within 24 hours, a developer found this string in the API response: kimi-k2p5-rl-0317-s515-fast. That one line of text unraveled the entire narrative. Cursor’s Composer 2 — the model that supposedly beat Claude Opus 4.6 on coding benchmarks at one-tenth… Continue reading
