The race to build AI agents that can actually operate a computer — clicking buttons, filling forms, navigating websites — has been dominated by closed-source giants. Anthropic’s Claude Computer Use and OpenAI’s Operator have set the pace, but they come with API costs, usage limits, and zero visibility into the model weights. That changed on March 17 at NVIDIA GTC, when H Company unveiled Holotron-12B: an open-source, 12-billion-parameter vision-language model built specifically for computer-use agents, developed in collaboration with NVIDIA. The numbers are hard to ignore — 80.5% on WebVoyager, more than double its predecessor’s throughput, and it runs on a single H100 GPU.
The Architecture Behind the Speed
Holotron-12B is not just a bigger version of H Company’s previous Holo2-8B. It is a fundamentally different model built on a different foundation.
While the Holo2 series was based on Qwen3-VL, Holotron-12B is post-trained from NVIDIA’s Nemotron-Nano-2 VL architecture. The key technical differentiator here is the hybrid State-Space Model (SSM) and attention mechanism. Traditional transformer-based vision-language models hit a throughput ceiling as sequence lengths and concurrency increase — each attention layer scales quadratically with context length. The hybrid SSM approach sidesteps this by handling sequential information with near-linear complexity while reserving standard attention for tasks that genuinely benefit from it.
The practical result: in controlled benchmarks using vLLM v0.14.1 with SSM optimizations on a single NVIDIA H100 GPU, Holotron-12B achieved 8.9k tokens per second at a concurrency of 100. Holo2-8B, by comparison, plateaued around 5.1k tokens/s under the same conditions. That is a 74% throughput increase, and the gap widens at higher concurrency because the SSM architecture continues to scale where the older model flattens out.
For production deployments — where a computer-use agent needs to handle dozens of simultaneous browser sessions, process multiple screenshots per step, and maintain long interaction histories — this throughput advantage translates directly into cost savings and lower latency.
Benchmark Performance: What 80.5% on WebVoyager Actually Means
WebVoyager is one of the standard benchmarks for evaluating web-based autonomous agents. It tests a model’s ability to complete real web tasks end-to-end: searching for flights, booking hotels, navigating e-commerce sites, filling out multi-step forms. It is notably harder than static screenshot understanding benchmarks because the model must plan, act, observe the result, and adapt across multiple steps.
Holotron-12B scored 80.5% on WebVoyager, up from the 35.1% achieved by the base Nemotron model before H Company’s post-training. That is a 45-percentage-point improvement, and it puts Holotron-12B ahead of Holo2-8B on this benchmark while being architecturally optimized for the throughput demands of production agent workloads.
Beyond WebVoyager, H Company reports substantial improvements on localization and grounding benchmarks including OSWorld-G, GroundUI, and WebClick. These benchmarks measure how accurately a model can identify and interact with specific UI elements — the buttons, text fields, dropdown menus, and links that a computer-use agent needs to manipulate. Strong grounding performance is critical because even a small error in clicking the wrong element can derail an entire multi-step workflow.
For context on the competitive landscape: as of early 2026, OpenAI’s CUA-based agents score around 87% on standard web task benchmarks, while Anthropic’s Claude Computer Use sits around 56% on comparable web-specific evaluations (though Claude excels at full desktop workflows). Holotron-12B’s 80.5% puts it in genuinely competitive territory among open-source alternatives — and ahead of several proprietary offerings on web navigation specifically.
H Company: From DeepMind Pedigree to $220M Seed Round
Understanding who built Holotron-12B adds important context. H Company was founded in 2023 in Paris by a team with deep roots in AI research. Laurent Sifre, one of the co-founders, was a principal scientist at DeepMind who contributed to AlphaGo, AlphaFold, and AlphaStar. Karl Tuyls was a research director at DeepMind focused on game theory and multi-agent systems. CEO Charles Kantor came from Stanford research.
The company raised a record-breaking $220 million seed round in May 2024 — at the time, the largest AI seed round in European history. Investors included Eric Schmidt, Amazon, Accel, Samsung, Bernard Arnault, and Xavier Niel. The round signaled serious institutional confidence in H Company’s bet on “action-oriented” AI agents rather than chatbots or content generation.
The company has not been without drama. In August 2024, three co-founders (Daan Wierstra, Karl Tuyls, and Julien Perolat) departed over operational disagreements. Despite this, H Company shipped its first product — Runner H, an agentic AI platform for enterprise automation — and followed it with the Holo model family: Holo1, Holo1.5, and the Holo2 series (4B, 8B, and 30B-A3B variants).
Holotron-12B represents a strategic pivot in architecture. By partnering with NVIDIA and building on the Nemotron backbone instead of continuing with Qwen-derived models, H Company gains access to NVIDIA’s optimized inference stack and the SSM efficiency gains that are becoming increasingly important as agent workloads scale.
Why This Matters for the Open-Source Agent Ecosystem
The computer-use agent space has a structural problem: the best-performing models are locked behind APIs. If you want to build an autonomous agent that navigates websites, fills out forms, or operates desktop applications, your choices have largely been to pay per-token to Anthropic or OpenAI, with no ability to fine-tune, inspect, or deploy on your own infrastructure.
Holotron-12B changes that calculus in a few important ways.
Single-GPU deployment. Running on one H100 means the infrastructure barrier is relatively low for companies or research labs that already have access to modern GPUs. You do not need a multi-node cluster. Cloud H100 instances are widely available from major providers, and the cost of a single GPU hour is a fraction of what sustained API usage costs for high-volume agent workloads.
Fine-tuning and customization. Because the weights are open and available on HuggingFace, developers can fine-tune Holotron-12B on their specific UI environments, proprietary applications, or domain-specific workflows. A healthcare company could train it on their EHR interfaces. A fintech startup could optimize it for their trading platforms. This kind of specialization is impossible with closed-source APIs.
Throughput for production. Most open-source VLMs were not designed with concurrent agent workloads in mind. Holotron-12B’s SSM-attention hybrid architecture was explicitly engineered for this — the 8.9k tokens/s at concurrency 100 is not a theoretical number but a measured benchmark on standard serving infrastructure.
The r/LocalLLaMA community, which has become one of the most influential forums for evaluating open-source models, has taken notice. The model’s combination of strong benchmark scores, practical single-GPU deployment, and the credibility of the NVIDIA partnership has generated significant discussion among developers exploring self-hosted agent solutions.
How Holotron-12B Stacks Up Against the Competition
| Model | Type | WebVoyager | Throughput | GPU Requirement | Open Source |
|---|---|---|---|---|---|
| Holotron-12B | VLM | 80.5% | 8.9k tok/s | 1x H100 | Yes |
| Holo2-8B | VLM | < 80.5% | ~5.1k tok/s | 1x GPU | Yes (Apache 2) |
| Claude Computer Use | API | ~56% (web) | N/A (API) | N/A | No |
| OpenAI CUA/Operator | API | ~87% (web) | N/A (API) | N/A | No |
The comparison is not perfectly apples-to-apples — Claude Computer Use handles full desktop environments better than web-only benchmarks suggest, and OpenAI’s numbers include their full orchestration stack. But for developers specifically building web-based autonomous agents who want control over their model weights and infrastructure, Holotron-12B is now the strongest open-source option available.
What Comes Next
H Company has signaled that Holotron-12B is the beginning of a new model line, not a one-off release. The NVIDIA partnership gives them access to continued Nemotron architecture improvements and optimized inference tooling. The shift from Qwen-derived architectures to the Nemotron SSM-attention hybrid suggests H Company is betting heavily on throughput efficiency as the key differentiator for production agent models.
The broader trend is clear: computer-use agents are moving from research demos to production deployments, and the models powering them need to handle the throughput, latency, and reliability demands of real-world use. Holotron-12B is the first open-source model that credibly addresses all three.
Frequently Asked Questions
Is Holotron-12B free to use?
Holotron-12B is open-source and available on HuggingFace. H Company’s previous Holo2 models were released under the Apache 2.0 license, which permits commercial use. Check the model card on HuggingFace for the specific license terms of Holotron-12B, as they may differ from the Holo2 series.
What hardware do I need to run Holotron-12B?
The model runs on a single NVIDIA H100 GPU using vLLM with SSM optimizations (v0.14.1 or later). This is significantly more accessible than many large VLMs that require multi-GPU setups. For lower-end hardware, quantized versions may become available from the community, though official benchmarks are based on full-precision H100 deployment.
How does Holotron-12B compare to Claude Computer Use and OpenAI Operator?
Holotron-12B scores 80.5% on WebVoyager, placing it between Anthropic’s Claude Computer Use (~56% on web tasks) and OpenAI’s CUA (~87%). The key advantage of Holotron-12B is that it is open-source — you can self-host, fine-tune, and deploy it without API dependencies. Claude and Operator offer more polished end-to-end agent experiences but require paid API access and do not allow weight-level customization.
Can I fine-tune Holotron-12B for my specific application?
Yes. Because the model weights are openly available, you can fine-tune Holotron-12B on your own datasets — custom UI environments, proprietary applications, or domain-specific web workflows. This is one of its main advantages over closed-source alternatives.
What is the difference between Holotron-12B and Holo2-8B?
They are built on entirely different architectures. Holo2-8B uses the Qwen3-VL backbone, while Holotron-12B is based on NVIDIA’s Nemotron-Nano-2 VL with a hybrid SSM-attention mechanism. Holotron-12B delivers over 2x throughput (8.9k vs 5.1k tokens/s), scores higher on WebVoyager (80.5% vs lower), and is specifically optimized for high-concurrency production agent workloads.
You Might Also Like
- Nvidia Nemotron 3 Super 120b Parameters 12b Active the Math Behind the Fastest Open Source Reasoning Model
- Fdm 1 Learned to use a Computer by Watching 11 Million Hours of Screen Recordings
- Vercel Agent Browser Might be the Smartest way to let ai Actually use the web
- Fara 7b Microsofts Tiny Model That can Actually use Your Computer
- Metas First Post Acquisition Move Manus my Computer Puts an ai Agent on Your Desktop

Leave a comment