Netflix VOID Scores 3.5x Over Runway in Blind Tests — Netflix’s First Open-Source AI Model

Remove a person from a video. Easy enough — plenty of tools can do that in 2026. Now remove a person who’s holding a guitar, and have the guitar fall to the ground because nobody’s holding it anymore.

That’s what Netflix VOID does. And it’s the reason the AI community spent the past 48 hours losing its collective mind over a video inpainting model.

VOID — Video Object and Interaction Deletion — landed on HuggingFace and GitHub on April 3, completely open source under Apache 2.0. It’s Netflix’s first publicly released AI model. That alone would be newsworthy. But the real buzz isn’t about the Netflix brand. It’s that VOID understands physics in a way no other video editing tool, open or proprietary, currently matches. In a blind test of 25 evaluators, VOID was preferred 64.8% of the time. Runway came in at 18.4%. ProPainter and other alternatives split the scraps.

One X user had to double-check it wasn’t an April Fools joke, given the timing. It’s not. The accompanying paper (arXiv: 2604.02296) runs over 20 pages of technical detail, and the model weights are live on HuggingFace for anyone to download.

Netflix’s Quiet Entry Into Open-Source AI

Netflix has 400-plus open-source projects on GitHub. Zuul, Eureka, Hystrix — infrastructure and data tools used by half the industry. But until this week, zero AI models. VOID changes that, and the choice of first release says a lot about where Netflix sees value.

This isn’t a chatbot. This isn’t a recommendation engine. It’s a production-grade video editing capability built for Netflix’s own VFX pipeline, then handed to the world with no strings attached. The Apache 2.0 license means full commercial use — fine-tune it, sell products built on it, deploy it wherever you want. For a company that’s historically guarded about its content technology, that’s a real shift.

The timing is telling. Adobe is embedding AI deeper into Premiere. Runway shipped Gen-4. Google’s Veo keeps improving. Netflix looked at this and decided the smart play wasn’t competing on video generation. It was owning one very specific, very hard capability and giving it away. Build the standard, let others build on top of it. The same playbook that made React, Kubernetes, and PyTorch industry defaults.

Multiple AI bloggers on X picked it up within hours of release. The Register ran a full feature. By April 4 it was the second-ranked story on llm-stats.com’s AI news aggregator. For a model that does one narrow thing — remove objects from video — that’s unusual traction.

How VOID Understands Physics (And Why That’s Hard)

Most video inpainting tools handle removal by filling the gap. Remove a person, fill the background. They’ve gotten decent at this. ProPainter does clean background fills. Runway’s inpainting handles simple scenes. The problem comes when the removed object was interacting with other objects.

A person carries a mug across the room. You remove the person. What happens to the mug? In every other tool, the mug vanishes with the person — the system doesn’t know the mug was a separate object being held. VOID removes the person and lets the mug drop, simulating the actual physics of what would happen if that person had never been there.

The research paper has an even wilder example. Two vehicles in a head-on collision. Remove one, and VOID generates the other vehicle driving smoothly down the road. No debris. No impact smoke. No residual damage on the pavement. It understands that the collision was an interaction between two objects, and removing one rewrites the other’s entire trajectory.

A swimming pool scene: someone jumps in, splashing water everywhere. Remove the person, and VOID renders a calm, undisturbed pool surface. No ripples, no splash residue.

The secret sauce is what the team calls “quadmask conditioning.” Instead of the typical binary mask (remove or keep), VOID uses a four-value mask that encodes four distinct regions. The primary object to remove. Overlap zones where the removed object touches others. Affected regions where physics will change — falling objects, displaced items, settling surfaces. And background that stays untouched. This four-way encoding is what lets the model reason about physical consequences rather than just filling pixels.

Under the hood, VOID is built on CogVideoX-Fun-V1.5-5b-InP, a 5-billion parameter video model from the Tsinghua and ZhipuAI ecosystem. Netflix fine-tuned it specifically for interaction-aware video inpainting. The mask generation pipeline is clever too — it uses SAM 2 for object segmentation and Google’s Gemini as a VLM to reason about which regions are physically affected by a removal. Meta’s segmentation plus Google’s reasoning, stitched together by Netflix. That’s the kind of pragmatic engineering that ships real products.

Training data is where the team got genuinely creative. You can’t easily get real-world paired videos showing “scene with object” versus “scene without object, with physical consequences.” So they built two synthetic data pipelines. HUMOTO uses motion capture data rendered in Blender — human characters interact with 63 different objects across 736 curated sequences totaling 7,875 seconds of motion at 30 fps. Remove the human in Blender, run the physics engine, and you get ground truth for what happens when objects lose their support. The second pipeline, Kubric, uses Google Scanned Objects launched at targets — remove the projectile, and the target’s trajectory changes predictably.

Both pipelines generate paired counterfactual videos that teach the model the relationship between object removal and physical consequences. It’s a smart workaround for a data problem that has no natural solution.

Training ran on 8 A100 80GB GPUs with DeepSpeed ZeRO Stage 2. By 2026 standards, that’s a modest compute budget. The fact that Netflix achieved state-of-the-art results without a massive cluster makes the model even more interesting for researchers who want to reproduce and extend the work.

VOID vs Runway vs ProPainter: The Numbers

The blind human preference study tells the clearest story. 25 evaluators, multiple removal scenarios, no labels on which model produced which result. VOID was preferred 64.8% of the time. Runway came in at 18.4%. That’s a 3.5x margin — not a close race by any standard.

ProPainter, the previous go-to open-source option for video inpainting, does solid background fills but has zero concept of physical interaction. Remove a person leaning on a table, and ProPainter gives you an empty background where the person stood. VOID gives you a table that might shift slightly because the weight pressing on it is gone. Functionally, they’re solving different problems.

Runway’s advantage is accessibility. It runs in a browser, handles the compute on their end, and offers a drag-and-drop UI that any creator can use. VOID needs a GPU with 40GB-plus VRAM — an A100, an H100, or maybe an RTX A6000 on the prosumer side. You’re not running this on a gaming laptop. For professional VFX studios, that’s a non-issue. For solo creators on a budget, it’s a barrier until cloud GPU prices drop further or quantized versions appear.

The workflow gap is even wider. Runway gives you simple controls: draw a mask, click remove. VOID requires Python scripts, quadmask generation, and understanding of the inference pipeline. Netflix released data generation code instead of pre-built training data due to licensing constraints on the underlying datasets, so fine-tuning means building your own training set from scratch. This is a research model first, a product second. But the results speak loud enough that production-ready wrappers are almost certainly being built by the community right now.

The comparison that matters most isn’t VOID versus Runway on simple removals — Runway handles those fine. It’s what happens when the scene has physical complexity. Runway treats object removal as a visual problem: fill the pixels, match the lighting, blend the edges. VOID treats it as a physics problem that produces visual output. That distinction sounds subtle, but it’s the difference between removing a person from an empty hallway and removing a person from a crowded kitchen where they were stirring a pot.

What This Signals for VFX and Video Production

Professional VFX artists spend ungodly hours on object removal. Wire cleanup, rig removal, unwanted crew reflections, boom mics poking into frame — it’s tedious, expensive work involving rotoscoping, clean plates, and frame-by-frame manual adjustment. A single shot can eat days of artist time. VOID won’t replace that entire workflow overnight, but it attacks the hardest part: reasoning about what should physically change when an object disappears.

For Netflix specifically, the internal use case is obvious. Remove production equipment, safety rigs, and anything that breaks the illusion. Having a model that understands physical consequences means less manual cleanup and faster VFX turnaround on shots that would otherwise require senior artists and expensive hours.

The open-source release creates a flywheel. External researchers and studios will fine-tune VOID, find edge cases, improve the pipelines, and push the capability forward — feeding back into the ecosystem Netflix built. That’s strategically smart. Netflix doesn’t need to compete with OpenAI on chatbots or with Google on video generation. It needs tools that make its content pipeline faster and cheaper. VOID is the first public evidence of what that production-focused AI strategy looks like.

For the VFX industry at large, a free model under Apache 2.0 that outperforms Runway’s commercial offering by 3.5x in blind tests is a shot across the bow. The hardware requirements limit immediate adoption, but those requirements always shrink. Quantized versions and optimized inference pipelines follow every popular open model like clockwork. When VOID runs on a 24GB card, the accessibility equation changes entirely — and so does the economics of every VFX studio that currently pays per-seat licenses for tools that do this worse.

Top AI Product

Leave a comment Cancel reply

Netflix VOID Scores 3.5x Over Runway in Blind Tests — Netflix’s First Open-Source AI Model

You Might Also Like

Share this:

Discover more from Top AI Product

Leave a comment Cancel reply