Top AI Product

Every day, hundreds of new AI tools launch across Product Hunt, Hacker News, and GitHub. We dig through the noise so you don't have to — surfacing only the ones worth your attention with honest, no-fluff reviews. Explore our latest picks, deep dives, and curated collections to find your next favorite AI tool.


Needle by Cactus Compute squeezes Gemini 3 tool calling into 26M parameters

Cactus Compute, the YC-backed on-device inference startup, open-sourced Needle today: a 26M-parameter Simple Attention Network that does single-shot function calling on phones and smartwatches. No MLPs, just attention and gating — the team argues FFN params are wasteful at this scale, and cross-attention is the right primitive for routing a query to the right tool.

The training story is the punchline

Pre-train: 200B tokens, 27 hours, 16 TPU v6e chips. Post-train on 2B function-call tokens: 45 minutes. Reported throughput hits 6,000 tokens/sec prefill and 1,200 tokens/sec decode. On single-shot function calling Needle beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, and LLaMA-2.5-350M — every comparison is roughly 10x its size.

Open weights and a finetune-your-own-tools playground

MIT licensed, weights on Hugging Face. The repo ships a playground UI that runs the whole pipeline locally — synthesize tool-call data via Gemini, finetune Needle on your own tool schema, evaluate, bundle the result. Show HN hit 174 points and #1 on the front page today.

The point isn’t replacing Gemini 3. It’s that if a 26M model can route tools well enough on a smartwatch, the agent loop stops needing the cloud. That’s the bet Cactus has been making since YC S25, and Needle is the most concrete proof yet.


You Might Also Like


Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.



Leave a comment