So Andrej Karpathy just dropped another open-source project, and honestly, this one feels different. [Autoresearch](https://github.com/karpathy/autoresearch) is a system that gives an AI agent a single GPU and a small but real LLM training setup, then lets it run experiments autonomously — all night long, no babysitting required. It showed up on [Hacker News](https://news.ycombinator.com/) today (61 points, 19 comments and climbing), and the repo already has nearly 3k stars on GitHub.
The idea is dead simple, which is probably why it works so well. The agent modifies the training code, kicks off a 5-minute training run, checks if the validation metric improved, keeps the change or throws it away, and repeats. That fixed 5-minute window is clever — it makes every experiment directly comparable regardless of what GPU you’re running on. You get roughly 12 experiments per hour. Go to bed, wake up, and you’ve got about 100 completed experiments waiting for you.
What I find most interesting is the “programming” model. Instead of writing Python scripts to orchestrate things, you write a Markdown file called `program.md` that instructs the agent on what to try. The agent then iterates on a single file — `train.py` — which contains the GPT model, optimizers, and the training loop. The success metric is `val_bpb` (validation bits per byte), which is vocabulary-size-independent, so even if the agent swaps out the tokenizer or changes the architecture, you can still compare results apples to apples.
The whole thing is deliberately minimal. One GPU, one file, one metric. No distributed training, no complex configs, just PyTorch and a few small packages. It’s built on top of a simplified single-GPU version of nanochat, using Muon and AdamW optimizers with BPE tokenization. Currently you need an NVIDIA GPU (tested on H100), but the barrier to entry is about as low as it gets for this kind of work.
This feels like a glimpse of where ML research is heading. Instead of a researcher manually tweaking hyperparameters and waiting for results, you describe what you want explored and let the agent grind through possibilities overnight. It’s not replacing the thinking — you still need to write good instructions in `program.md` — but it’s automating the tedious cycle of change-train-evaluate that eats up so much time. Knowing Karpathy’s track record with projects like nanoGPT, I wouldn’t be surprised if this spawns a whole ecosystem of community-contributed experiment programs.
You Might Also Like
- Microgpt Andrej Karpathy Crammed an Entire gpt Into 243 Lines of Python and it Actually Works
- Shuo sub 500ms Voice Agent 600 Lines of Python That Make Voice ai Feel Instant
- Agent Builder by Thesys When ai Agents Stop Talking and Start Showing
- Cloudrouter Gives Your ai Coding Agent its own Cloud Machine and Thats a big Deal
- Google A2ui Agent to User Interface Finally a Standard way for ai Agents to Show you Things

Leave a comment