GuppyLM: 9 Million Parameters, 5 Minutes, One Free GPU

The AI industry burns billions training trillion-parameter models. GuppyLM goes the opposite direction: 8.7 million parameters, 6 transformer layers, a 4,096-token vocabulary. Train it from scratch in 5 minutes on a free Google Colab T4 GPU. The whole thing fits in a single Jupyter notebook.

It hit Hacker News on April 6, 2026, pulled 150 upvotes, and landed on bestofshowhn.com’s April rankings. For a solo project with zero marketing budget, that’s a loud signal about what developers actually care about right now.

What GuppyLM Actually Is

A teaching tool disguised as a chatbot. The model plays a character — a small fish named Guppy who thinks the meaning of life is food and calls you “my favorite big shape.” Ask it about politics and it’ll talk about water temperature instead.

The real product isn’t the fish. It’s the architecture you can see through. Six transformer layers. 384-dimensional hidden states. Six attention heads. No grouped-query attention, no RoPE, no SwiGLU. Just the vanilla transformer from the original 2017 “Attention Is All You Need” paper, with nothing bolted on.

Most people learn about LLMs from blog posts explaining GPT-4’s trillion-parameter architecture. That’s like learning to drive by studying Formula 1 engines. GuppyLM hands you a go-kart and says: build it, drive it, take it apart.

The model weights are on Hugging Face (arman-bd/guppylm-9M) and the full 60,000-sample training dataset is published too. Everything is MIT-licensed. You can run the pre-trained model in your browser or retrain from zero — your call.

Every Design Decision Is a Lesson

This is where GuppyLM gets interesting. Every architectural choice is a deliberate trade-off that teaches you something about how LLMs work at a fundamental level.

No system prompt. At 9M parameters, the model can’t follow conditional instructions. Personality gets baked directly into the weights instead. That saves about 60 tokens per inference — a huge deal when your entire context window is only 128 tokens. This is the same reason early chatbots felt so rigid: they literally couldn’t hold both a persona definition and your question in memory at the same time.

Single-turn only. The creator tried multi-turn conversations and quality fell apart after 3-4 exchanges. Rather than faking capability, they cut the feature entirely. At 128 tokens of context, there’s simply no room for conversation history. Honest engineering beats feature inflation.

All 60,000 training samples are synthetic — generated from templates across 60 topics, 30 tank objects, 17 food types, and 25 activities. Sounds limiting, but this controlled dataset keeps the fish persona rock-solid across every response. Feed a 9M-parameter model real-world internet data and you’d get incoherent noise.

The embedding layer and output head share weights. GPT-2 does the same trick, but good luck visualizing what’s happening in a billion-parameter model. In GuppyLM, you can actually inspect the shared matrix, see how input tokens map to output probabilities, and understand why weight tying works.

Inference runs at 50-100ms per response. Not because of clever optimization — because there’s barely anything there. You’re watching a transformer think in real time.

How GuppyLM Stacks Up Against Other Tiny LLM Projects

The “build a tiny LLM to learn” space has some strong entries.

Karpathy’s nanoGPT is the gold standard — a minimal GPT-2 training script that became the default starting point for understanding transformers. We’ve covered Karpathy’s approach to autonomous AI research before, and nanoGPT reflects the same philosophy: strip away everything non-essential. llm.c takes it further, reimplementing the whole thing in pure C with no PyTorch dependency.

GuppyLM adds something neither project has: a complete character trained on structured dialogue data. NanoGPT trains on Shakespeare or OpenWebText and gives you a text completion engine. GuppyLM gives you a chatbot with a consistent personality. That means you’re learning not just “how does a transformer generate tokens” but “how does training data shape model behavior” — which is arguably the more important question in 2026, when everyone’s debating alignment and persona control in much larger models.

Liquid AI’s LFM2-350M (350 million parameters) and TinyLlama (1.1 billion) sit in a completely different category. They’re optimized for production inference on edge devices — real tasks, real benchmarks. GuppyLM isn’t competing there. It’s not trying to be useful. It’s trying to be understandable.

The 190x parameter gap between GuppyLM (9M) and even the smallest “production” tiny models (LFM2 at 350M) tells you how wide the gulf is between “educational” and “deployable.” That gap itself is a lesson worth understanding.

What 150 HN Upvotes for a Fish Chatbot Tells You

In a Hacker News feed full of Show HN posts trying to compete with Claude and GPT, a 9-million-parameter fish pulling 150 upvotes is telling. The HN comments reportedly went deep on what transformers actually learn at this scale — exactly the kind of discussion that doesn’t happen when someone launches yet another wrapper around a frontier API.

The gap between “I’ve read the theory” and “I’ve trained one myself” is wider than most people admit. Every ML course teaches attention mechanisms on whiteboards. Very few hand you a complete, trainable model that finishes before your coffee gets cold.

GuppyLM closes that gap in 5 minutes, for free. In a world where the median AI project requires a six-figure GPU budget just to get started, that’s not nothing.

Top AI Product

Leave a comment Cancel reply