Top AI Product

Every day, hundreds of new AI tools launch across Product Hunt, Hacker News, and GitHub. We dig through the noise so you don't have to — surfacing only the ones worth your attention with honest, no-fluff reviews. Explore our latest picks, deep dives, and curated collections to find your next favorite AI tool.


Gated DeltaNet-2 decouples erase and write in linear attention — beats Mamba-3 and KDA at 1.3B

Gated DeltaNet-2, from the NVIDIA and MIT team behind the original, fixes a subtle flaw in how linear-attention models manage memory. Prior delta-rule models (Gated DeltaNet, KDA) used a single scalar gate to do two jobs at once — erasing old content and writing new content. v2 decouples them, and the gains show up exactly where you’d expect: long-context retrieval.

## The core problem

Linear attention squeezes an unbounded KV cache into a fixed-size recurrent state. The hard part isn’t just deciding what to forget — it’s editing that compressed memory without scrambling the associations already stored. One gate forced to both erase and write makes clean edits impossible. Two gates fix it.

## The numbers

Gated DeltaNet-2 beats KDA and Mamba-3 — the latest and best recurrent architectures — head to head at 1.3B parameters. The biggest gains are on RULER long-context retrieval: S-NIAH-3 jumps from 63 to 90 over KDA, and multi-key needle retrieval climbs from 28 to 38.

## Why it matters

The original Gated DeltaNet already got picked up by Qwen3.5. Linear-attention architectures are how you get cheap long context without quadratic attention cost — and retrieval quality has been their weak spot. If v2’s editing improvements hold at scale, the next generation of efficient long-context models has a new default building block.


Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.



Leave a comment