Foundation Models & LLM Research
-
MegaTrain trains 100B-parameter LLMs on a single GPU — with 1.5TB of RAM
Training a 100-billion-parameter model usually means a cluster of expensive GPUs. MegaTrain flips the script: store everything in CPU memory, and treat the GPU as a temporary math worker. How It Works The core idea is dead simple. Parameters and optimizer states live in host RAM. During forward and backward passes, MegaTrain streams weights to… Continue reading
-
Meta Muse Spark: $14B and 9 Months Later, Alexandr Wang Delivers — but It’s Closed Source
Meta Superintelligence Labs just shipped its first model. Muse Spark, code-named Avocado, was built in nine months by the team Mark Zuckerberg assembled after paying $14 billion to poach Alexandr Wang from Scale AI. What Muse Spark Actually Does Multimodal input — voice, text, images — but text-only output. It’s already live on Meta AI… Continue reading
-
Google’s Gemma 4 Now Runs Inside Your Browser — Gemma Gem Needs Zero API Keys
A Chrome extension just turned your browser into an AI runtime. Gemma Gem loads Google’s Gemma 4 model entirely on-device via WebGPU. No API keys. No cloud calls. Your data never leaves your machine. The extension hit Hacker News as a Show HN post and pulled 154 points — not because running local models is… Continue reading
-
Z.AI’s GLM-5.1 Tops SWE-Bench Pro at 58.4 — Trained on Zero Nvidia Hardware
What Is It Z.AI (Zhipu AI) shipped GLM-5.1 — a 754B-parameter MoE model with 40B active parameters, open-sourced under MIT. It’s the first Chinese model to hit #1 on SWE-Bench Pro. Score: 58.4. Ahead of GPT-5.4 (57.7), Claude Opus 4.6 (57.3), and Gemini 3.1 Pro (54.2). The entire model was trained on 100,000 Huawei Ascend… Continue reading
-
26 Engineers, $20M — Arcee AI Trinity-Large-Thinking Scores Within 2 Points of Claude Opus
What Is It Arcee AI, a 26-person US startup, shipped Trinity-Large-Thinking — a 399B-parameter open-source reasoning model under Apache 2.0. Built in 33 days on 2,048 NVIDIA Blackwell GPUs for just $20 million. It runs on a Mixture-of-Experts architecture: 399B total parameters, only 13B active per token. That means 2-3x faster inference than dense models… Continue reading
-
MemPalace scores 96.6% on LongMemEval — Milla Jovovich’s open-source AI memory beats paid rivals
An actress from Resident Evil building an AI tool sounds like a PR stunt. It’s not. Milla Jovovich and engineer Ben Sigman built MemPalace after months of frustration with AI forgetting everything between sessions. The core philosophy: don’t let the AI decide what’s worth remembering. Store everything, make it searchable. How It Works Conversations get… Continue reading
-
Parlor Runs a Full Voice + Vision AI on Your MacBook — No API Key, No Cloud, 2.6 GB Total
Six months ago, running a real-time voice AI locally required an RTX 5090. Now a 2-billion-parameter model on an M3 Pro does voice, vision, and conversation at 83 tokens/sec. That’s the entire pitch behind Parlor — and it’s more impressive than it sounds. Parlor grabbed 265 points on Show HN, landed #6 on bestofshowhn.com’s April… Continue reading
-
Parlor runs real-time voice + vision AI on a MacBook — 2.6 GB, zero cloud, zero API keys
Remember that GPT-4o voice demo? Camera on, talking naturally, AI responding in real time. Impressive — except it runs on OpenAI’s servers, costs money per minute, and every frame of your face goes to the cloud. Parlor does the same thing on an M3 Pro. Entirely local. 266 points on Hacker News in a day.… Continue reading
-
Parlor puts real-time voice and vision AI on your laptop — 2.6 GB, no cloud, no API keys
Remember when OpenAI demo’d GPT-4o voice mode and everyone lost their minds? Camera on, voice flowing, AI responding in real time. Cool — except it runs on OpenAI’s servers, costs money, and sends your data to the cloud. Parlor does the same thing on your MacBook. Entirely local. What Parlor Actually Does You open a… Continue reading
-
GuppyLM: 9 Million Parameters, 5 Minutes, One Free GPU
The AI industry burns billions training trillion-parameter models. GuppyLM goes the opposite direction: 8.7 million parameters, 6 transformer layers, a 4,096-token vocabulary. Train it from scratch in 5 minutes on a free Google Colab T4 GPU. The whole thing fits in a single Jupyter notebook. It hit Hacker News on April 6, 2026, pulled 150… Continue reading
