NVIDIA’s Isaac GR00T N1.6 is the latest open foundation model for humanoid robots — a vision-language-action (VLA) model that turns camera streams, robot state, and plain-language instructions into one unified control policy.
## What changed from N1.5
N1.6 isn’t a rewrite; it’s a sharpening. Architecture, data, and modeling improvements let it beat the previous N1.5 both on simulated manipulation benchmarks and, more importantly, on real bimanual hardware — YAM, Agibot Genie-1, and the Unitree G1. The headline capability is full-body control, not just arms: locomotion and manipulation under one policy.
## The reasoning layer
What makes it more than a motor controller is that N1.6 plugs into NVIDIA’s Cosmos Reason for contextual understanding — so the robot can interpret an instruction in context instead of pattern-matching a demo. NVIDIA pairs it with a sim-to-real workflow, training in simulation and transferring onto physical robots. And it’s open, which matters: the humanoid field has been starved for a shared foundation model the way language modeling had one years ago.

Leave a comment