Top AI Product

Every day, hundreds of new AI tools launch across Product Hunt, Hacker News, and GitHub. We dig through the noise so you don't have to — surfacing only the ones worth your attention with honest, no-fluff reviews. Explore our latest picks, deep dives, and curated collections to find your next favorite AI tool.


LongTraceRL Mines Tiered Distractors From Search-Agent Traces for Long-Context RL

LongTraceRL trains long-context reasoning the way an agent actually experiences a long context — by reusing what real search agents do — and grades the model with entity-level rubric rewards instead of a single yes/no on the final answer.

## Tiered distractors from agent traces

Long-context RL has been bottlenecked by sparse rewards and easy distractors. LongTraceRL fixes both. Multi-hop questions are generated by knowledge-graph random walks, and the distractors are mined from real search agent trajectories: documents the agent read but never cited become high-confusability distractors, and documents that appeared in search results but were never opened become low-confusability ones. That gradient — from looks-relevant-but-isn’t down to obvious noise — is what makes the model actually learn to discriminate, instead of memorising the shape of an easy negative.

## Entity-level rubric, positive-only

The reward signal is the other half. Instead of outcome-only credit, LongTraceRL scores responses against the gold entities that should appear along each reasoning chain — fine-grained, entity-level process supervision. The rubric is applied only to responses with correct final answers, distinguishing the quality of reasoning among the answers that already got the result right and blocking the reward-hacking shortcut of “say the answer, skip the work.”

## Why it matters

Long-context reasoning is one of the places where models still embarrass themselves at scale. Building training data and rewards from agent trajectories — already produced for free every time an agent runs — closes the loop between deployment and improvement. The code is at THU-KEG/LongTraceRL.


Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.



Leave a comment