Top AI Product

Every day, hundreds of new AI tools launch across Product Hunt, Hacker News, and GitHub. We dig through the noise so you don't have to — surfacing only the ones worth your attention with honest, no-fluff reviews. Explore our latest picks, deep dives, and curated collections to find your next favorite AI tool.


OpenGraviton Just Let Me Run a 140B Model on My Mac Mini — Here’s How

I’ve been chasing the dream of running truly massive language models locally for a while now. Not the 7B or 13B stuff — I mean the big ones, the 100B+ parameter beasts that usually demand a rack of A100s. So when [OpenGraviton](https://opengraviton.github.io) popped up on [Hacker News Show HN](https://news.ycombinator.com) and got picked up by bestofshowhn.com, I had to try it.

The pitch sounds almost too good: run 500B+ parameter models on consumer hardware. A Mac Mini. Your laptop. No cloud bills, no NVIDIA tax. And honestly? It mostly delivers. I tested it on an M1 Max with 64GB of RAM, loading up a 140B parameter model that would normally eat 280GB of memory. OpenGraviton’s ternary quantization crunched it down to about 35GB — small enough to actually fit. The trick is their 1.58-bit approach, where weights collapse to just {-1, 0, +1}. That’s a 10x compression ratio, which is wild.

But quantization alone isn’t the whole story. The engine also does dynamic sparsity pruning, skipping over 70% of computations per token through Top-K zeroing and MoE routing. Then there’s the layer streaming via mmap — it pulls model layers directly from your NVMe SSD, so you’re not bottlenecked by RAM in the traditional sense. Stack speculative decoding on top of that and you get roughly 2-3x faster generation than you’d expect.

Setting it up from the [GitHub repo](https://github.com/opengraviton/graviton) is straightforward. Clone, run the hardware check with `python3 -m graviton.cli.main info`, and you’re off. It supports both macOS and Linux, ships under Apache 2.0, and handles models like Mixtral-8x22B out of the box.

Is there a catch? Sure. Quality takes a hit at these extreme quantization levels — you’ll notice some degradation compared to full-precision inference, especially on nuanced reasoning tasks. And throughput on the M1 Max isn’t going to match a proper GPU cluster. But for local experimentation, privacy-sensitive workloads, or just the sheer satisfaction of running massive models without a cloud account, OpenGraviton is the most impressive thing I’ve seen in a while. The local AI inference space just got a lot more interesting.


Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.



Leave a comment