MemTrace is a framework for a debugging problem that gets worse as agents get longer-lived: when an agent’s memory gives a wrong answer, why did it fail? Was it stored wrong, retrieved wrong, or lost along the way? MemTrace turns a memory pipeline into an executable “memory evolution graph” so you can trace the information flow operation by operation.
## Attribution, not just observation
The framework constructs MemTraceBench from representative memory systems — Long-Context, RAG, Mem0, and EverMemOS — to study how memory actually fails. Its automatic attribution method iteratively traces operation subgraphs to pinpoint the root cause of a failed case. The finding is that memory failures aren’t random: they’re systematic, stemming from operation-level issues like information loss and retrieval misalignment.
## Why it matters
Most memory work focuses on storing and retrieving better; MemTrace focuses on diagnosing what broke, which is the missing half. And it closes the loop — feeding those fine-grained attribution signals into prompt optimization automatically corrects faults and lifts end-task performance by up to 7.62%. As agents run for days and accumulate memory, “the agent remembered wrong” becomes a real failure class, and you can’t fix what you can’t trace.

Leave a comment