Top AI Product

Every day, hundreds of new AI tools launch across Product Hunt, Hacker News, and GitHub. We dig through the noise so you don't have to — surfacing only the ones worth your attention with honest, no-fluff reviews. Explore our latest picks, deep dives, and curated collections to find your next favorite AI tool.


Alibaba Page-Agent (in-page JavaScript GUI agent) skips screenshots — it reads the DOM instead

Everyone building web agents right now points a multimodal model at screenshots — browser-use, UI-TARS, OpenAI’s computer use. Alibaba’s new open-source Page-Agent went the opposite way: no screenshots, no vision model, no browser extension, no headless browser. It’s plain JavaScript that lives inside your page, reads the live DOM as text, and clicks and types as the actual logged-in user. GitHub gave it 700+ stars in a single day.

What it actually is

The trick is called DOM dehydration. A real page has thousands of nodes, and dumping raw HTML into an LLM is slow and expensive. Instead Page-Agent scans the DOM, pulls out every interactive element — buttons, links, inputs — tags each with an index, role, and label, and flattens it into a clean text map. A small, cheap text model can then act on it precisely. No pixels involved.

Why the SDK matters

It’s an MIT-licensed, TypeScript-first library. Point model and baseURL at any OpenAI-compatible backend — DashScope/Qwen, GPT, Claude, or a local Ollama — and a few lines drop an AI copilot into your SaaS. The pitch: turn a 20-click ERP/CRM/admin workflow into one sentence, with no backend rewrite. That’s the part enterprise teams will care about.


You Might Also Like


Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.



Leave a comment