LearnWeak is a training framework for a frustrating reality: small, open computer-use agents — the ones that click and type through apps — stay noticeably weaker than big closed models, and just throwing more synthetic training data at a domain barely moves the needle. Its fix is to stop training broadly and start training on the agent’s actual weak spots.
## A stronger teacher that targets gaps
The framework is annotation-free. A stronger reference agent watches the smaller “student” attempt tasks in a target domain, identifies where it fails, then synthesizes targeted tasks and constructs the supervision automatically — no human labelling. The clever part is an error-aware specialization objective that separates planning errors from execution errors, so updates are behaviorally precise. Fixing “the agent chose the wrong step” and “the agent clicked the wrong button” are different problems, and treating them uniformly wastes the signal.
## Why it matters
The economics of computer-use agents favor small models you can run cheaply and locally — but only if they’re reliable in your specific domain. Broad fine-tuning gives diffuse, marginal gains. Diagnosing a student agent’s precise failure modes and drilling them, the way a coach targets a weakness, is a more sample-efficient path to specialized agents that actually work where you deploy them.

Leave a comment