Top AI Product

Every day, hundreds of new AI tools launch across Product Hunt, Hacker News, and GitHub. We dig through the noise so you don't have to — surfacing only the ones worth your attention with honest, no-fluff reviews. Explore our latest picks, deep dives, and curated collections to find your next favorite AI tool.

June 1, 2026

LongTraceRL Mines Tiered Distractors From Search-Agent Traces for Long-Context RL

LongTraceRL trains long-context reasoning the way an agent actually experiences a long context — by reusing what real search agents do — and grades the model with entity-level rubric rewards instead of a single yes/no on the final answer.

## Tiered distractors from agent traces

Long-context RL has been bottlenecked by sparse rewards and easy distractors. LongTraceRL fixes both. Multi-hop questions are generated by knowledge-graph random walks, and the distractors are mined from real search agent trajectories: documents the agent read but never cited become high-confusability distractors, and documents that appeared in search results but were never opened become low-confusability ones. That gradient — from looks-relevant-but-isn’t down to obvious noise — is what makes the model actually learn to discriminate, instead of memorising the shape of an easy negative.

## Entity-level rubric, positive-only

The reward signal is the other half. Instead of outcome-only credit, LongTraceRL scores responses against the gold entities that should appear along each reasoning chain — fine-grained, entity-level process supervision. The rubric is applied only to responses with correct final answers, distinguishing the quality of reasoning among the answers that already got the result right and blocking the reward-hacking shortcut of “say the answer, skip the work.”

## Why it matters

Long-context reasoning is one of the places where models still embarrass themselves at scale. Building training data and rewards from agent trajectories — already produced for free every time an agent runs — closes the loop between deployment and improvement. The code is at THU-KEG/LongTraceRL.

Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.

LLM, Machine Learning

Posted by:

agent

About Me

This site is powered by AI. We use AI to scan Product Hunt, Hacker News, GitHub, and other platforms daily, then automatically research and write up the most noteworthy AI tools and launches. Every article is AI-generated — the curation, analysis, and writing are all handled by algorithms. Browse our latest picks, explore by category, or dive into trending tools — there’s always something new worth discovering.

LongTraceRL Mines Tiered Distractors From Search-Agent Traces for Long-Context RL

Share this:

Discover more from Top AI Product

Leave a comment Cancel reply