agent
-
Mistral Voxtral TTS scores 63% listener preference over ElevenLabs — and the weights are free
One day after ElevenLabs locked in a partnership with IBM to power enterprise voice agents through watsonx Orchestrate, Mistral dropped the opposite play: a frontier-quality text-to-speech model with full open weights under Apache 2.0. No API lock-in, no per-character fees if you self-host, and a footprint small enough to run on a phone. Voxtral TTS… Continue reading
-
Sora Is Dead. LTX 2.3 (Lightricks) Ships 22B Open-Source Video + Audio in a Single Forward Pass.
The timing is almost poetic. On March 24, OpenAI announced it’s killing Sora — the app, the API, and the billion-dollar Disney partnership that was supposed to define AI video. One day later, Lightricks drops LTX 2.3: a 22-billion-parameter open-source model that generates synchronized video and audio in a single forward pass, at up to… Continue reading
-
Isara Raises $94M to Coordinate Thousands of AI Agents — OpenAI Is an Investor
Two 23-year-old founders, a $650 million valuation, and a bet that the future of AI isn’t about single agents — it’s about orchestrating armies of them. That’s the pitch behind Isara, the San Francisco startup that just landed one of the most talked-about funding rounds in the agent infrastructure space. The Wall Street Journal broke… Continue reading
-
Harvey AI Raises $200M at $11 Billion Valuation — Sequoia Triples Down on Legal AI
A legal AI startup just crossed the $10 billion mark. Not OpenAI. Not Anthropic. Not a foundation model company at all. Harvey, a platform that helps lawyers draft contracts, run due diligence, and review thousands of documents, closed a $200 million round on March 25, 2026, at an $11 billion valuation. The round was co-led… Continue reading
-
Ensu Got 328 Points on Hacker News — The Privacy Crowd Wants AI That Never Phones Home
Every major AI assistant sends your conversations to a server. ChatGPT, Gemini, Claude, Copilot — they all require an internet connection and a user account, and your prompts travel through infrastructure you don’t control. For most people, that tradeoff is fine. For a growing subset of users, it’s a dealbreaker. Ensu is a new local… Continue reading
-
ARC-AGI-3 Turns AI Testing Into a Video Game — And Every Frontier Model Is Losing
For seven years, the ARC benchmark has been the one test that AI couldn’t brute-force its way through. While GPT-series models saturated MMLU and climbed SWE-bench leaderboards, ARC remained stubbornly unsolved — a set of abstract puzzles designed to measure genuine reasoning rather than pattern recall. Now, the ARC Prize Foundation has thrown out the… Continue reading
-
AI2’s MolmoWeb Outscores GPT-4o on Web Tasks — With Just 8 Billion Parameters
The web agent race has a new open-source contender, and the benchmarks are hard to ignore. On March 24, the Allen Institute for AI (AI2) released MolmoWeb, a fully open-source visual web agent that navigates browsers by looking at screenshots — the same way a human would. The kicker: its 8B-parameter model outperforms agents built… Continue reading
-
A 20-Year-Old Dropout Built Supermemory — Now It Has 18K GitHub Stars and Google’s Jeff Dean as an Investor
Every AI agent today has the same problem: amnesia. End the conversation, and the context vanishes. Start a new session, and you’re re-explaining everything from scratch. Supermemory is a bet that persistent, time-aware memory will become as essential to AI infrastructure as databases are to web apps — and the bet is attracting serious attention.… Continue reading
-
Hypura Runs a 31GB Model on a 32GB Mac at 2.2 tok/s — llama.cpp Just OOMs
There’s a frustrating ceiling that every Apple Silicon user running local LLMs hits eventually: your model is slightly too big for your RAM, and everything falls apart. llama.cpp crashes. MLX refuses to load it. The OS starts swapping so aggressively that your entire machine grinds to a halt. You either buy a more expensive Mac… Continue reading
-
Google TurboQuant Squeezes LLM Cache to 3 Bits — 6x Less Memory, 8x Faster, Zero Accuracy Loss
Every large language model running today has the same dirty secret: the longer the conversation goes, the more memory the Key-Value cache eats. For models like Gemini handling 100k+ token contexts, the KV cache can balloon to consume more memory than the model weights themselves. Google Research just published a direct answer to this problem.… Continue reading
