Foundation Models & LLM Research
-
Google Releases Gemini Embedding 2 — One Vector Space for Text, Images, Video, and Audio
Building a search system that handles text is straightforward enough. Adding images makes it harder. Throw in video and audio, and suddenly you’re maintaining three or four separate embedding pipelines, each with its own model, its own vector index, and its own set of headaches. Google’s answer to this problem is Gemini Embedding 2, its… Continue reading
-
Hume AI open-sources TADA — an LLM-based TTS with zero hallucinations and 0.09 RTF
LLM-based text-to-speech systems have a dirty secret: they hallucinate. Words get skipped, phrases get invented, and entire sentences sometimes come out garbled. The root cause is a fundamental mismatch — text and audio operate on completely different timescales, and when you force a language model to bridge that gap with hundreds of audio tokens per… Continue reading
-
Fish Audio Just Open-Sourced S2 — and It Beats GPT-4o-mini-tts With an 81.88% Win Rate
Text-to-speech has been good enough to read your emails aloud for years. But getting AI voices to actually sound like they mean what they’re saying? That’s been the frustrating part. You want a whisper here, a confident tone there, maybe a laugh mid-sentence — and most TTS tools either ignore you or make you jump… Continue reading
-
Hermes Agent by Nous Research Might Be the Open-Source AI Agent That Finally Remembers Everything
Every AI agent framework makes the same promise: it’ll handle your tasks autonomously. But ask any developer who’s actually tried building with agents, and they’ll tell you about the same frustrating loop — the agent completes a task, you close the session, and next time it starts from scratch with zero memory of what it… Continue reading
-
AMI Labs: Yann LeCun Just Raised $1.03 Billion to Prove LLMs Are a Dead End
Yann LeCun has spent the last two years telling anyone who would listen that large language models are fundamentally limited. Now he has $1.03 billion to prove it. The Turing Award winner’s new startup, AMI Labs (Advanced Machine Intelligence), just closed one of the largest seed rounds in AI history, valued at $3.5 billion pre-money.… Continue reading
-
Phi-4-reasoning-vision-15B: Microsoft’s 15B Model Just Embarrassed GPT-4o on Vision Tasks
If you’ve been paying attention to AI Twitter or the [Hacker News](https://news.ycombinator.com/) front page this past week, you’ve probably seen people losing their minds over Microsoft’s latest release. [Phi-4-reasoning-vision-15B](https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B) dropped on March 4th, and it’s one of those models that makes you rethink everything you assumed about model size and capability. Here’s the deal: this… Continue reading
-
SWE-CI Exposes What AI Coding Agents Still Can’t Do
There’s been a lot of chest-thumping lately about AI coding agents solving real-world GitHub issues. SWE-bench scores keep climbing, and every new model launch comes with claims about “state-of-the-art” issue resolution rates. But here’s the thing — fixing a single bug in isolation is very different from maintaining a codebase over months. [SWE-CI](https://arxiv.org/abs/2603.03823) is a… Continue reading
-
Your Anonymous Posts Aren’t Anonymous Anymore — Inside the LLM 大规模去匿名化研究
So here’s something that should make you uncomfortable: a group of researchers just proved that LLMs can figure out who you are from your “anonymous” online posts, and they can do it at scale for about four bucks per person. The paper, [“Large-scale online deanonymization with LLMs”](https://arxiv.org/abs/2602.16800), comes from [MATS Research](https://www.matsprogram.org/research/large-scale-online-deanonymization-with-llms) — authored by Simon… Continue reading
