Foundation Models & LLM Research
-
From 35% to 80% on WebVoyager: Holotron-12B Sets a New Bar for Open-Source Computer-Use Agents
The race to build AI agents that can actually operate a computer — clicking buttons, filling forms, navigating websites — has been dominated by closed-source giants. Anthropic’s Claude Computer Use and OpenAI’s Operator have set the pace, but they come with API costs, usage limits, and zero visibility into the model weights. That changed on… Continue reading
-
397 Billion Parameters on a 48GB MacBook: Flash-MoE Turns Apple’s 2023 Research into Reality
A 397-billion-parameter model running at 4.4 tokens per second on a laptop with 48GB of RAM. No cloud API. No multi-GPU server. Just a MacBook Pro, an NVMe SSD, and about 7,000 lines of C and Metal code that nobody wrote by hand. Flash-MoE landed on Hacker News and GitHub Trending this week, and the… Continue reading
-
Tiiny AI Pocket Lab Raised $1M in 5 Hours — But Can a 300g Device Really Replace the Cloud?
A pocket-sized box weighing less than two iPhones claims to run 120-billion-parameter AI models without an internet connection. Tiiny AI Pocket Lab hit $1 million on Kickstarter in five hours, earned a Guinness World Record, and attracted coverage from TechRadar, WCCFtech, and TweakTown. It also attracted a wave of technical scrutiny that raises real questions… Continue reading
-
Meta Omnilingual MT Covers 1,600 Languages — 8x More Than Google Translate
Most AI translation tools top out around 200-250 languages. Google Translate, arguably the most widely used translation service on the planet, supports 249. Meta’s earlier NLLB (No Language Left Behind) project covered about 200. And beyond that threshold, translation quality for traditional systems degrades rapidly — most break down entirely after 300-400 languages. Meta’s new… Continue reading
-
Mamba-3 Scores 4% Higher Than Transformers at 7x the Speed — and It’s Fully Open Source
For nearly a decade, Transformers have been the unchallenged default architecture for language models. Challengers have come and gone — RNNs, LSTMs, various state space experiments — but none managed to beat the Transformer on both quality and speed at the same time. They’d win on efficiency but lose on accuracy, or match performance but… Continue reading
-
Kitten TTS: 15 Million Parameters, 25MB on Disk, and Zero GPU Required for Natural Speech
Most text-to-speech models worth using weigh hundreds of megabytes and expect a GPU. Kitten TTS, from the KittenML team, takes a different bet — what if you could get genuinely expressive voice synthesis from a model small enough to fit on a smartwatch? On March 19, KittenML dropped three new models on Hacker News and… Continue reading
-
Cursor Composer 2 takes on Anthropic and OpenAI with a $0.50/M token coding model — and the benchmarks back it up
For the past two years, AI coding tools have lived and died by the models underneath them. Cursor rode Claude. GitHub Copilot ran on OpenAI. Windsurf mixed and matched. Everyone was a reseller with a nice UI on top. That dynamic shifted on March 19, 2026, when Cursor unveiled Composer 2 — a proprietary, code-only… Continue reading
-
MiniMax M2.7 Scores 56% on SWE-Pro — and It Helped Build Itself
An AI model that writes its own training code, debugs its own failures, and decides whether to keep or revert its own changes. That’s what MiniMax claims M2.7 actually does. Released on March 18, 2026, this is the Shanghai-based company’s follow-up to M2.5, and it introduces something the industry hasn’t seen before at this scale:… Continue reading
-
Two Consumer GPUs, One Evening, and a 245% Reasoning Boost: How LLM Circuit Finder Works
Most approaches to improving LLM reasoning involve expensive fine-tuning, synthetic data pipelines, or reinforcement learning loops that eat GPU-weeks. LLM Circuit Finder throws all of that out. Instead, it copies three specific transformer layers, pastes them back into the forward pass, and watches logical deduction scores jump from 0.22 to 0.76 on Big-Bench Hard. No… Continue reading
-
OpenAI Parameter Golf: $1M in Compute Credits for Squeezing a Language Model Into 16MB
OpenAI just launched a public competition that feels more like a hacker challenge than a corporate event. The premise is deceptively simple: build the best language model you can, but it has to fit — code, weights, and all — into 16 megabytes. That’s smaller than most smartphone photos. And your total training budget? Ten… Continue reading
