There’s been a flood of browser automation tools lately, but most of them feel like they were built for demos, not real work. They spin up headless browsers, require you to hand over your credentials, and break the moment a page layout changes. [PageAgent](https://alibaba.github.io/page-agent/) takes a fundamentally different approach, and honestly, it’s the first one that actually made me rethink how I interact with complex web apps.
The core idea is simple: PageAgent is a JavaScript agent that lives *inside* the web page itself. No browser extension hell, no Python scripts, no headless Chrome instances. You drop it into a page (or use the browser extension), and it can understand and interact with the DOM directly. Because it runs in your actual browser session, it uses whatever login state you already have. No sharing passwords, no cookie juggling, no OAuth dance with a third-party service. That alone puts it ahead of most alternatives I’ve tried.
What surprised me most is the LLM flexibility. You can hook it up to OpenAI, Claude, DeepSeek, Qwen, Gemini, or even run it fully offline through Ollama. There’s no backend server involved — your data goes straight from the page to whichever model you configure. For anyone working with sensitive internal tools or enterprise dashboards, that’s a huge deal.
The built-in thinking panel is a nice touch too. You can watch the agent reason through each step, and if it’s about to do something dumb, you just stop it and correct course. It feels collaborative rather than “fire and forget,” which builds a lot more trust than the typical black-box agent experience.
PageAgent recently popped up as a [Show HN post](https://news.ycombinator.com/item?id=47264138) and scored 70 points on [bestofshowhn.com](https://bestofshowhn.com) for March 2026, sparking some solid discussion about security and practicality of in-browser agents. The [GitHub repo](https://github.com/alibaba/page-agent) is MIT-licensed, which means you can actually fork it and adapt it for your own stack without worrying about licensing headaches.
If you’ve been looking for a way to turn those painful 20-click admin workflows into a single natural language command, PageAgent is worth a serious look. It’s not trying to replace your browser — it’s trying to make the one you already use a lot smarter.

Leave a comment