Long agent sessions have one dirty secret: the longer they run, the more expensive every turn gets, because you keep stuffing the whole history back into the prompt. Microsoft Research’s answer is Memora, a long-term memory system for AI agents that dropped June 29 and lands as an ICML 2026 paper. The headline number: up to 98% fewer context tokens than full-context processing.
What it actually is
Memora is a memory framework you plug into an agent, not a chatbot or an app. The trick is decoupling what’s stored from how it’s retrieved. Each entry gets a tiny 6–8 word “abstraction” plus the rich full value. Only the abstraction gets embedded and searched; the heavy content never gets matched against directly. Retrieval works like reasoning — it refines queries and follows “cue anchors” to pull in related memories, not just similar ones.
Why it matters
Agent memory is the most crowded lane in AI right now. Memora’s edge isn’t a new idea, it’s the receipts: 86.3% on LoCoMo, 87.4% on LongMemEval, beating Mem0, Zep, LangMem, RAG, and even full-context inference — while reading a fraction of the tokens. Code’s already public.
You Might Also Like
- Microsoft Bitnet 100b Parameters on a Single cpu 0 4 gb of Memory Zero Gpus
- Openviking Treats ai Agent Memory Like a File System and 9k Github Stars say its Working
- Agent Kernel Gives any ai Coding Agent Persistent Memory With Just Three Markdown Files
- Memvid Packs ai Agent Memory Into a Single File and Outperforms Sota rag by 35
- Microsoft Agent Governance Toolkit Scores 10 10 on Owasp Agentic Risks at 0 1ms per Check

Leave a comment