Top AI Product

We track trending AI tools across Product Hunt, Hacker News, GitHub, and more  — then write honest, opinionated takes on the ones that actually matter. No press releases, no sponsored content. Just real picks, published daily.  Subscribe to stay ahead without drowning in hype.


Autoresearch: Karpathy’s Overnight AI Researcher That Runs 100 Experiments While You Sleep

So Andrej Karpathy just dropped another open-source project, and honestly, this one feels different. [Autoresearch](https://github.com/karpathy/autoresearch) is a system that gives an AI agent a single GPU and a small but real LLM training setup, then lets it run experiments autonomously — all night long, no babysitting required. It showed up on [Hacker News](https://news.ycombinator.com/) today (61 points, 19 comments and climbing), and the repo already has nearly 3k stars on GitHub.

The idea is dead simple, which is probably why it works so well. The agent modifies the training code, kicks off a 5-minute training run, checks if the validation metric improved, keeps the change or throws it away, and repeats. That fixed 5-minute window is clever — it makes every experiment directly comparable regardless of what GPU you’re running on. You get roughly 12 experiments per hour. Go to bed, wake up, and you’ve got about 100 completed experiments waiting for you.

What I find most interesting is the “programming” model. Instead of writing Python scripts to orchestrate things, you write a Markdown file called `program.md` that instructs the agent on what to try. The agent then iterates on a single file — `train.py` — which contains the GPT model, optimizers, and the training loop. The success metric is `val_bpb` (validation bits per byte), which is vocabulary-size-independent, so even if the agent swaps out the tokenizer or changes the architecture, you can still compare results apples to apples.

The whole thing is deliberately minimal. One GPU, one file, one metric. No distributed training, no complex configs, just PyTorch and a few small packages. It’s built on top of a simplified single-GPU version of nanochat, using Muon and AdamW optimizers with BPE tokenization. Currently you need an NVIDIA GPU (tested on H100), but the barrier to entry is about as low as it gets for this kind of work.

This feels like a glimpse of where ML research is heading. Instead of a researcher manually tweaking hyperparameters and waiting for results, you describe what you want explored and let the agent grind through possibilities overnight. It’s not replacing the thinking — you still need to write good instructions in `program.md` — but it’s automating the tedious cycle of change-train-evaluate that eats up so much time. Knowing Karpathy’s track record with projects like nanoGPT, I wouldn’t be surprised if this spawns a whole ecosystem of community-contributed experiment programs.


You Might Also Like


Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.



Leave a comment