Microsoft just put a number on the “agent swarm vs single super-model” debate, and the swarm won. MDASH — short for multi-model agentic scanning harness — hit 88.45% on the public CyberGym benchmark, about five points ahead of Anthropic’s Mythos (83.1%) and OpenAI’s GPT-5.5 (81.8%).
What MDASH actually is
Not a model. A cybersecurity agent system: 100+ specialized AI agents wired into a pipeline. One group scans code. A second debates whether each finding is genuinely exploitable or just noise. A third writes the proof-of-concept exploit. Each stage has its own prompts, tools and stop criteria. The whole thing is model-agnostic — Microsoft mixes frontier and distilled models per job.
Why this week matters
Microsoft timed the reveal with May 2026 Patch Tuesday: 16 of the Windows flaws shipped this month came from MDASH, including 4 critical RCEs in the TCP/IP stack, IKEEXT IPsec service, HTTP.sys and Netlogon. On internal MSRC regressions it hit 100% recall on tcpip.sys cases across five years. A leaderboard number is one thing — shipping real RCEs into Patch Tuesday is the part Anthropic and OpenAI can’t easily counter.
You Might Also Like
- Mistral Cybersecurity Model for European Banks Anthropic Mythos Rival Lands as ecb Sounds the Alarm on Mythos Attacks
- 13b Into Openai yet Microsoft Copilot Cowork Runs on Anthropic Claude
- Cursor Composer 2 Takes on Anthropic and Openai With a 0 50 m Token Coding Model and the Benchmarks Back it up
- From Claude Flow to Ruflo 22k Stars 5900 Commits and the Multi Agent Swarm Taking Over Claude Code
- Microsoft Agent Governance Toolkit Scores 10 10 on Owasp Agentic Risks at 0 1ms per Check

Leave a comment