83K GitHub Stars and $17M in Funding: How browser-use Became the Default Framework for AI Browser Agents

Two ETH Zurich grad students built a demo in five weeks. Twelve months later, browser-use sits at 83,500 GitHub stars, has taken $17 million in seed funding led by Felicis, and counts over 20 Y Combinator W25 startups as users. In a space crowded with browser automation tools, this open-source Python framework has pulled away from the pack by doing one thing well: giving any LLM the ability to control a real web browser.

The Problem That Made browser-use Explode

AI agents are only as useful as the actions they can take. And in 2025, the biggest gap in agent infrastructure was the web browser. LLMs could generate text, write code, and call APIs — but they couldn’t fill out a form, click through a multi-step checkout, or navigate a legacy portal that never got an API.

Traditional browser automation tools like Selenium and Playwright solve this for developers who write explicit scripts. But AI agents don’t work that way. They need to perceive a page, decide what to do, and act — without hardcoded selectors or pre-written flows.

That’s the gap Magnus Muller and Gregor Zunic spotted. Both were pursuing master’s degrees in data science at ETH Zurich when they started building browser-use through the university’s Student Project House accelerator. The initial version connected an LLM to Playwright’s browser engine and let the model figure out the navigation steps on its own. The GitHub repo went from zero to 25,000 stars in three months.

How browser-use Actually Works

At its core, browser-use runs a perceive-act loop. Each cycle has two phases:

Perception: The framework takes the current page’s DOM, strips out scripts and styles, and tags every interactive element with a numeric identifier. For vision-capable models, it can also capture screenshots. This “DOM distillation” step is critical — it reduces the token count sent to the LLM by removing everything that isn’t actionable.

Action: The LLM receives the simplified DOM (and optionally the screenshot) and returns structured commands — click element 7, type “quarterly revenue” into element 12, scroll down, open a new tab. browser-use executes these through Playwright and the loop repeats until the task is done.

This architecture is model-agnostic. You can plug in GPT-4o, Claude, Gemini, DeepSeek, or any local model via LiteLLM. The framework doesn’t care which LLM is driving — it just needs structured action output.

Key technical features include:

Multi-tab management: Agents can open, switch between, and close multiple browser tabs within a single task
Session persistence: Save and load browser cookies so agents don’t need to re-authenticate on every run
Custom action injection: Define your own actions beyond the built-in click/type/scroll primitives
Self-correction: When an action fails, the agent can re-observe the page and try an alternative approach
LangChain integration: For teams already building with LangChain, browser-use slots in as a tool in existing agent workflows

The result on benchmarks is strong: browser-use scores 89.1% on WebVoyager, a suite of 586 diverse web tasks — currently the highest success rate for any open-source browser agent framework.

Who’s Using It and For What

The use cases break into a few clear buckets:

Data extraction from sites without APIs. Many businesses still run on portals, dashboards, and legacy systems that were never designed for programmatic access. browser-use lets an agent log in, navigate to the right page, and pull structured data — without anyone writing CSS selectors.

Competitor monitoring and market research. Pricing pages, product catalogs, job listings — anything that changes frequently and lives behind a web interface. Teams use browser-use to build agents that check these sources on a schedule and flag changes.

Form filling and workflow automation. Submitting expense reports, updating CRM records, filing government forms. The kind of repetitive click-through work that eats hours every week.

QA and testing. Instead of maintaining brittle test scripts tied to specific element IDs, some teams are experimenting with LLM-driven testing where the agent navigates based on intent (“add item to cart and check out”) rather than hardcoded paths.

The Y Combinator connection accelerated adoption significantly. More than 20 companies in YC’s Winter 2025 batch used browser-use for their own products. When a startup accelerator with that much influence adopts a tool, the network effects are hard to ignore.

browser-use vs. the Competition

The AI browser agent space has gotten crowded. Here’s how browser-use stacks up against the main alternatives:

browser-use vs. Stagehand (Browserbase)

Stagehand takes a different philosophy. Where browser-use gives the LLM full autonomy to figure out navigation steps, Stagehand provides a TypeScript SDK with three core primitives — act(), extract(), and observe() — that give developers fine-grained control over agent behavior. Stagehand is tighter and more predictable; browser-use is more flexible and model-agnostic. If you want your agent to handle unexpected UI variations, browser-use is the better fit. If you want deterministic, reproducible automation, Stagehand has the edge.

browser-use vs. Steel Browser

Steel is more infrastructure than framework. It provides managed browser sandboxes — anti-detect, proxy rotation, session management — that you can run any automation framework on top of. browser-use and Steel are actually complementary: you can run browser-use agents inside Steel’s cloud browsers for production-grade reliability.

browser-use vs. Playwright/Selenium (traditional)

This isn’t really an apples-to-apples comparison. Playwright and Selenium require you to write explicit automation scripts. browser-use wraps Playwright and adds the LLM reasoning layer on top. You’d use traditional tools when you need deterministic, high-frequency automation at scale. You’d use browser-use when the task requires judgment, adaptation, or working with unfamiliar interfaces.

browser-use vs. Lightpanda

Lightpanda is a headless browser built from scratch in Zig, designed specifically for AI agent workloads. It’s faster than Chromium-based solutions but has a smaller ecosystem. browser-use runs on standard Chromium via Playwright, which means broader compatibility but heavier resource usage.

The Trade-offs You Should Know About

browser-use isn’t a silver bullet. Production deployments surface real limitations:

Token costs add up fast. Every perceive-act cycle is an LLM API call. Screenshot-based perception compounds the cost further. For high-frequency tasks, this can get expensive quickly compared to traditional scripted automation.

Non-deterministic by nature. Because an LLM is making navigation decisions, the same task can take different paths on different runs. This is a feature for exploratory tasks but a bug for anything requiring consistent, reproducible output.

Bot detection remains a challenge. browser-use doesn’t have built-in bypasses for sophisticated anti-bot systems like Cloudflare Turnstile. The cloud version offers anti-detect features and proxies, but the open-source library alone won’t get you past aggressive WAFs.

Hallucination loops. When faced with CAPTCHAs or unfamiliar UI patterns, the LLM can sometimes repeat ineffective actions instead of recognizing it’s stuck. The self-correction mechanism helps, but it’s not perfect.

Open Source vs. Cloud: The Business Model

browser-use itself is MIT-licensed and free. You can self-host everything, bring your own LLM API key, and pay nothing beyond compute and API costs.

Browser Use Cloud is the commercial offering. It adds managed cloud browsers, anti-detect capabilities, CAPTCHA solving, residential proxies across 195+ countries, and a no-code interface called ChatBrowserUse. The company positions the cloud product as the production-ready layer for teams that don’t want to manage browser infrastructure themselves.

This open-core model — free framework, paid infrastructure — is the same playbook that worked for companies like Grafana and GitLab. The $17 million seed round, led by Felicis with participation from Paul Graham, A Capital, Nexus Venture Partners, SV Angel, and Pioneer Fund, suggests investors are betting on this approach.

What Comes Next

The trajectory is clear: as AI agents become more capable, they need more ways to interact with the digital world. The web browser is the most universal interface humans use, and browser-use has positioned itself as the bridge between LLMs and that interface.

With 83,500 stars and nearly 10,000 forks, browser-use has hit the kind of critical mass where the community itself becomes a moat. Contributors keep improving the framework, tutorials proliferate, and new LLM providers add compatibility. The question isn’t whether AI browser agents will become standard infrastructure — it’s whether browser-use can maintain its lead as bigger players enter the space.

For now, if you’re building anything that needs an AI agent to interact with the web, browser-use is the default starting point. Not because it’s perfect, but because it’s open, flexible, battle-tested at scale, and backed by a community that’s growing faster than almost anything else on GitHub.

FAQ

Is browser-use free to use?

Yes. The core framework is MIT-licensed and completely free. You only pay for LLM API calls (to OpenAI, Anthropic, Google, etc.) and any infrastructure you use. Browser Use Cloud is a separate paid product that adds managed browsers, anti-detect features, and proxies.

What LLMs does browser-use support?

browser-use is model-agnostic. It works with GPT-4o, Claude, Gemini, DeepSeek, and any model supported by LiteLLM — including locally-hosted open-source models via Ollama. You’re not locked into any single provider.

How does browser-use compare to Selenium or Playwright?

Selenium and Playwright are traditional browser automation tools that require explicit scripts with hardcoded selectors. browser-use adds an LLM reasoning layer on top of Playwright, letting agents navigate based on natural language instructions rather than pre-written code. Use traditional tools for deterministic, high-volume tasks; use browser-use when tasks require judgment or adaptation.

Can browser-use handle login-protected websites?

Yes. Agents can type credentials into form fields as part of their task flow. For recurring tasks, browser-use supports saving and loading browser session cookies so the agent doesn’t need to log in every time.

Who built browser-use?

browser-use was created by Magnus Muller and Gregor Zunic, who started the project while pursuing master’s degrees in data science at ETH Zurich. The company went through Y Combinator’s Winter 2025 batch and raised a $17 million seed round led by Felicis, with participation from Paul Graham and other notable investors.

Top AI Product

Leave a comment Cancel reply