So Microsoft quietly dropped something pretty interesting — a 7-billion-parameter model called [Fara-7B](https://github.com/microsoft/fara) that can browse the web and click around like a human. Not generate text about browsing. Actually browse. Click buttons, scroll pages, type into fields, the whole deal.
What makes Fara-7B stand out is how it works. Most “computer use” setups rely on accessibility trees or extra helper models to figure out what’s on screen. Fara skips all of that. It just looks at the webpage visually — like you and I do — and predicts exact coordinates for where to click. That’s a surprisingly elegant approach, and it means the model doesn’t need a complicated pipeline of tools bolted on around it.
The performance numbers are hard to ignore too. On the [WebVoyager benchmark](https://www.microsoft.com/en-us/research/blog/fara-7b-an-efficient-agentic-model-for-computer-use/), Fara-7B hits a 73.5% success rate. For context, GPT-4o with Set-of-Marks prompting lands at 65.1%. A 7B model beating a frontier model on a real-world web task benchmark — that got people’s attention fast. It’s also way more efficient, completing tasks in about 16 steps on average compared to 41 for similarly-sized models like UI-TARS-1.5-7B.
The model is built on top of Qwen2.5-VL-7B and trained on 145K synthetic trajectories generated through Microsoft’s [Magentic-One](https://github.com/microsoft/magentic-ui) multi-agent framework. The training data covers a wide range of websites and task types, which probably explains why it generalizes pretty well across different web environments.
What really gets me excited is the local deployment angle. Because it’s only 7B parameters, you can actually run this thing on a decent GPU without sending anything to the cloud. Privacy-conscious folks will appreciate that — your browsing data stays on your machine. You can self-host it with vLLM if you’ve got around 24GB of VRAM, or try it through [Azure AI Foundry](https://labs.ai.azure.com/projects/fara-7b/) and [Hugging Face](https://huggingface.co/microsoft/Fara-7B).
The community response has been massive. It’s been trending on GitHub with over 3.6k stars, [IT Pro ran a piece](https://www.itpro.com/technology/artificial-intelligence/microsoft-fara-7b-agentic-small-language-model) calling it “more powerful than GPT-4o,” and it’s been popping up across trendshift.io and Microsoft Research channels. The MIT license doesn’t hurt either.
Fair warning though — this is still an experimental release. Microsoft recommends running it in a sandboxed environment and keeping an eye on what it does. It’s not something you’d want to let loose on your banking website unsupervised just yet. But as a proof of concept for where local AI agents are heading, Fara-7B is genuinely impressive.

Leave a comment