NVIDIA Labs has released SANA-WM, a 2.6B-parameter controllable world model that generates 720p, one-minute videos with 6-DoF camera control. The model is open source with weights on GitHub at NVlabs/Sana.
## The efficiency story
SANA-WM trained on roughly 213K public video clips with metric-scale pose supervision. Total training run: 15 days on 64 H100 GPUs. That’s an order of magnitude less compute than big-lab world models like LingBot-World or HY-WorldPlay, yet visual quality benchmarks are comparable.
## Inference on consumer hardware
Each 60-second clip generates on a single GPU. The distilled variant runs on a single RTX 5090 with NVFP4 quantization and produces a 60-second 720p clip in 34 seconds. This is the first time a serious open-weight world model has been deployable on prosumer hardware.
## Why it matters
For anyone building robotics, autonomous driving simulators, or game-engine training pipelines, world models have been a frontier-lab privilege. SANA-WM cracks open that gate — frontier-grade quality, RTX-5090 inference, open weights, all in one release. Currently riding the top of Hacker News at 140+ points the morning after launch.

Leave a comment