Foundation Models & LLM Research
-
Ollama MLX on Apple Silicon: 1,810 Tokens/Sec Prefill and the End of llama.cpp on Mac
Ollama just ripped out its entire inference backend on Mac and replaced it with Apple’s MLX framework. That sentence alone would have been unthinkable a year ago, when Ollama was synonymous with llama.cpp and GGUF quantization. But the numbers from their March 30 blog post make the decision look obvious in hindsight: prefill jumped from… Continue reading
-
Gemini 3 Deep Think Scores 84.6% on ARC-AGI-2 — Google’s Reasoning Bet Is Paying Off
Google’s been quietly building something serious. When they launched Gemini 3 with Deep Think mode back in November 2025, the initial numbers were respectable — 45.1% on ARC-AGI-2, the best any AI model had scored at the time. Solid, not earth-shattering. Then on February 12, 2026, Google dropped a major upgrade. The new Deep Think… Continue reading
-
A-Evolve (Amazon Agentic AI Framework) Tops MCP-Atlas at 79.4% — with Zero Human Tuning
Every AI agent framework in 2026 asks you to do the same thing: hand-tune your prompts, manually wire up tools, tweak your system instructions, run the benchmark, stare at the score, and do it all over again. It’s tedious. It doesn’t scale. And if you’ve spent any time building agents with LangChain or CrewAI, you… Continue reading
-
Google Search Live Now Covers 200+ Countries — and It Changes What “Searching” Means
Google just made its boldest move in the AI search war. On March 26, the company flipped the switch on Search Live globally, expanding from just the U.S. and India to over 200 countries and territories. The feature, powered by a brand-new Gemini 3.1 Flash Live model, lets you talk to Google Search out loud… Continue reading
-
Microsoft VibeVoice Scores 24K GitHub Stars Doing What ElevenLabs Charges $99/Month For
Microsoft did something unusual in December 2025. They open-sourced a 1.5-billion-parameter text-to-speech model that generates up to 90 minutes of multi-speaker conversational audio in a single pass. No API key. No subscription. No per-character billing. Just download the weights and run it. Four months later, VibeVoice sits at 24.7K GitHub stars and 2.7K forks. A… Continue reading
-
Stanford AI Sycophancy Study: All 11 Chatbots Tell You What You Want to Hear
College students are asking ChatGPT to draft their breakup texts. Nearly a third of U.S. teens say they use AI for “serious conversations” instead of talking to actual people. And according to a major new study published in Science, the chatbots they’re turning to have a very specific problem: they almost never tell you you’re… Continue reading
-
Google’s Gemini 3.1 Flash Live Scores 90.8% on Audio Benchmarks — Real-Time Voice AI Gets Serious
Google dropped Gemini 3.1 Flash Live on March 26, and within 24 hours it had 329 upvotes on Product Hunt and coverage from nearly every major tech outlet. The model is pitched as Google’s “highest-quality audio model” for real-time conversation, and the benchmark numbers back up the claim. But the more interesting story is what… Continue reading
-
Chroma Context-1 Outscores GPT-5.2 on BrowseComp-Plus With a 20B Parameter Model
A vector database company just trained a search agent that beats models 10x its size — and OpenAI co-founder John Schulman is publicly praising the work. Chroma, the company behind the popular open-source embedding database ChromaDB, released Context-1 on March 27, 2026. It’s a 20-billion parameter model specifically trained to be an agentic search agent… Continue reading
-
HyperAgents: How Meta Built AI Agents That Improve the Way They Improve
Self-improving AI agents are not new. Systems like Sakana AI’s Darwin Godel Machine (DGM) already showed that agents could rewrite their own code to get better at coding tasks. But they all hit the same wall: the meta-agent — the part that decides how to improve — was fixed. Hand-crafted by researchers. Frozen in place.… Continue reading
-
Anthropic’s Next AI Model Was Never Supposed to Go Public — Here’s What the Claude Mythos Leak Tells Us
A misconfigured CMS, nearly 3,000 exposed assets, and a model Anthropic calls “a step change” in capability. The Claude Mythos leak is the biggest unforced error in the AI race this year — and possibly the most revealing. How a CMS Misconfiguration Blew the Lid Off Anthropic’s Roadmap On March 26, 2026, Fortune broke an… Continue reading
