Most computer-use agents — the kind that click buttons, fill forms, and drive apps for you — lean on big cloud models. Holo3.1, H Company’s June 2026 family of vision-language models, pushes that capability onto hardware you already own, running a local agent in about 140ms on a 12GB GPU.
## What Holo3.1 does
Built on Qwen, Holo3.1 reads a screen and acts on it across web, desktop, and now mobile. It adds native function-calling so it slots straight into agent frameworks, and it is tuned for UI grounding — knowing exactly where to click. On the AndroidWorld benchmark the 35B-A3B model jumps from 67% to 79.3%, while the 4B and 9B variants climb from 58% to 72%, so even the small ones got meaningfully better at real mobile tasks.
## Sizes and access
The family spans 0.8B, 4B, 9B, and a 35B-A3B mixture-of-experts model, with quantized GGUF and NVFP4 checkpoints for cheap local deployment. Weights are open on Hugging Face, and there is a hosted Holo Models API for teams that would rather not run their own. The pitch is speed and cost: capable computer-use agents without a frontier-model bill.

Leave a comment