Foundation Models & LLM Research
-
Qwen 3.6-Plus vs Claude Opus 4.6: 3x the speed, 1/17th the price, and the benchmarks are uncomfortably close
Alibaba dropped Qwen 3.6-Plus on April 2nd, and the numbers are hard to ignore. On SWE-bench Verified — the benchmark that actually matters for coding — it scores 78.8%. Claude Opus 4.6 scores 80.9%. That’s a 2.1-point gap. On Terminal-Bench 2.0, Qwen 3.6-Plus flips the script entirely: 61.6% vs Claude’s 59.3%. And the pricing? Input… Continue reading
-
Embarrassingly Simple Self-Distillation (SSD) Boosts Qwen3-30B Code Scores by 30% — No Teachers, No RL, No Tricks
Apple researchers just published a paper that made Hacker News lose its mind. 596 points, 180 comments, top AI post of the day. The title alone tells you why: “Embarrassingly Simple Self-Distillation Improves Code Generation.” The pitch is almost too good to believe. Take a model. Have it generate its own code solutions. Filter out… Continue reading
-
Generalist GEN-1 Scores 99% Success Rate on Robot Tasks — With Just 1 Hour of Robot Data
Half a million hours of humans grabbing, folding, and stacking things. That’s what Generalist AI fed into its new foundation model before it ever touched a robot. The result is GEN-1, and the numbers are hard to argue with: 99% success rate on tasks where previous models managed 64%, roughly 3x faster execution than the… Continue reading
-
Netflix VOID Scores 3.5x Over Runway in Blind Tests — Netflix’s First Open-Source AI Model
Remove a person from a video. Easy enough — plenty of tools can do that in 2026. Now remove a person who’s holding a guitar, and have the guitar fall to the ground because nobody’s holding it anymore. That’s what Netflix VOID does. And it’s the reason the AI community spent the past 48 hours… Continue reading
-
Gemma 4 Scores 89% on AIME With Just 4B Active Parameters — Google’s Open Model Bet Gets Real
Google has been playing defense in the open model race for months. Llama 4 grabbed headlines. Qwen 3.5 dominated coding benchmarks. Gemma 3, despite solid performance, kept losing enterprise deals over one thing that had nothing to do with intelligence: its license. That changed on April 2. Gemma 4 dropped with four model sizes, vision… Continue reading
-
Microsoft MAI Models (MAI-Transcribe-1, MAI-Voice-1, MAI-Image-2) Are Live — Redmond’s AI Independence Starts Now
Five months. That’s how long it took from the formation of Microsoft’s MAI Superintelligence team to shipping three foundation models that directly compete with OpenAI, Google, and every major AI provider in the market. On April 2nd, Microsoft AI — the division led by Mustafa Suleyman — released MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 into public preview… Continue reading
-
StepFun Step 3.5 Flash Activates Only 11B of 196B Parameters — and Still Matches GPT-5.2
A Chinese AI startup just dropped a 196-billion-parameter model under Apache 2.0, and the kicker is: it only uses 11 billion of those parameters at any given moment. StepFun’s Step 3.5 Flash hit the top of Hacker News this week with a simple claim — it’s the number one cost-effective model for OpenClaw tasks, beating… Continue reading
-
Liquid AI LFM2.5-350M: How 350 Million Parameters Trained on 28 Trillion Tokens Outrun Models Twice Its Size
There’s a number that should make every AI engineer stop and think: 80,000 to 1. That’s the token-to-parameter ratio of Liquid AI’s new LFM2.5-350M — a model with just 350 million parameters that was trained on 28 trillion tokens. For context, most models see maybe 20 to 100 tokens per parameter during training. Liquid AI… Continue reading
-
PrismML Exits Stealth With $16M and a 1-Bit Model That Rivals Llama 3 at 1/16th the Memory
An 8-billion-parameter model that fits in 1 GB of memory. Not a quantized approximation of a bigger model. Not a research paper that’ll never ship. A production-ready LLM, trained from scratch with 1-bit weights, running at 368 tokens per second on an RTX 4090 and 44 tokens per second on an iPhone. PrismML came out… Continue reading
-
Google TimesFM Turns BigQuery into a Forecasting Engine — 200M Parameters, Zero Training, 16K Context
Time series forecasting is one of those problems that sounds simple and is absolutely not. You want to predict next week’s sales, next month’s server load, next quarter’s energy demand. Traditional approach: hire a data scientist, collect historical data, pick a model (ARIMA, Prophet, maybe a custom LSTM), train it, tune it, deploy it, and… Continue reading
