The Reflex team just published numbers that should make every computer use believer uncomfortable. Author Palash Awasthi gave Claude Sonnet the same back-office task — find the customer named Smith with the most orders, process his latest pending shipment — through two paths and timed both.
The 45x Gap, In One Run
Browser visual agent: 53 steps, 14 to 22 minutes, around 550K tokens. HTTP API agent built from auto-generated endpoints: 8 calls, under 20 seconds, 12K tokens. Same task, same model, two routes — visual costs 45x more. Claude Haiku doesn’t finish the visual run at all. It crashes outright.
Why HackerNews Cared
This isn’t a product. It’s an open-source benchmark and a blog post. It still landed 347 upvotes and 202 comments, because it puts hard numbers on what skeptics have muttered for months: computer use looks magical in demos and falls apart on cost, latency, and reliability when you actually run it. Anthropic, OpenAI, and a wave of YC startups are still selling the visual-agent dream. Reflex just made the cleanest case yet that for any task with an API behind it, structured wins.
You Might Also Like
- Anthropic Gives Claude Computer use on mac and you can Control it From Your Phone
- Fdm 1 Learned to use a Computer by Watching 11 Million Hours of Screen Recordings
- Vercel Agent Browser Might be the Smartest way to let ai Actually use the web
- Claude in Excel Just Made me Rethink how i use Spreadsheets
- Fara 7b Microsofts Tiny Model That can Actually use Your Computer

Leave a comment