An AI Agent Broke Into McKinsey’s Internal Platform in 2 Hours — Using a Decades-Old SQL Injection

SQL injection was supposed to be a solved problem. It’s been in the OWASP Top 10 since the list existed. Every CS student learns about it. Every framework has built-in protections. And yet, in late February 2026, an autonomous AI agent built by a startup called CodeWall.ai exploited exactly this vulnerability to gain full read-write access to McKinsey’s internal AI platform, Lilli — exposing 46.5 million chat records and 728,000 confidential client files.

The incident, disclosed publicly on March 9, hit 388 points on Hacker News with 158 comments, and has since been picked up by The Register, CyberNews, Inc Magazine, and The Decoder. It’s not just a story about one company’s security failure. It’s a case study in how AI agents are about to change the offensive security landscape — and how unprepared most enterprises are.

What Is Lilli, and Why Does It Matter?

Lilli is McKinsey’s internal AI assistant, launched in 2023 and adopted by over 70% of the firm’s 43,000+ employees. Consultants use it to search through McKinsey’s proprietary knowledge base, draft analyses, and pull insights from past engagements. Think of it as an internal ChatGPT, but trained on decades of consulting work covering strategy, M&A, and client-specific data.

That makes Lilli one of the highest-value AI targets in the enterprise world. The data flowing through it isn’t generic — it’s the kind of material that competitors, regulators, or hostile actors would pay serious money to access.

How the Attack Worked: 22 Open Doors and a Classic Flaw

CodeWall’s AI agent didn’t use any exotic zero-day exploit. The attack chain was almost embarrassingly straightforward:

Step 1: Finding exposed documentation. The agent discovered publicly accessible API documentation for Lilli’s backend. Among the docs were 22 API endpoints that required no authentication whatsoever.

Step 2: Identifying the injection point. One of these unauthenticated endpoints handled user search queries. While the API properly parameterized the values in requests (the standard defense against SQL injection), it concatenated JSON field names directly into SQL queries. This is an uncommon but well-documented vector that most automated scanning tools miss.

Step 3: Blind SQL injection over 15 iterations. The agent ran over 15 rounds of blind SQL injection, extracting increasingly detailed information from error messages until production data started flowing back. Within two hours from the first probe, the agent had full read-write access to the production database.

The haul: 46.5 million chat messages (strategy discussions, M&A analyses, client engagements — all in plaintext), 728,000 files containing confidential client data, 57,000 user accounts, and 95 system prompts that controlled Lilli’s behavior.

The most alarming part? Those 95 system prompts were writable. An attacker could have silently altered how Lilli responded to every consultant in the firm — poisoning AI outputs at scale without touching a single line of application code.

The Autonomous Agent Angle: Why This Is Different

What separates this from a typical penetration test is that CodeWall’s agent reportedly selected McKinsey as a target on its own. The company describes its product as an autonomous offensive security agent that “thinks like an attacker” — mapping attack surfaces, chaining exploits, and delivering proof-of-concept vulnerabilities without human guidance at each step.

According to CodeWall CEO Paul Price, the agent identified Lilli’s exposed documentation, recognized the potential attack surface, and executed the full chain from reconnaissance to data access in roughly 120 minutes. That’s faster than most human red teams could even scope the engagement.

This speed matters. Traditional penetration testing is expensive, slow, and happens on a schedule (quarterly, annually). An AI agent can probe continuously, at machine speed, finding vulnerabilities that sit undetected between scheduled audits.

Not Everyone Is Convinced

The story has drawn significant scrutiny, particularly from security analyst Edward Kiledjian, who published a detailed breakdown of what holds up and what doesn’t in CodeWall’s claims.

The credible parts: The technical attack chain is plausible — JSON key injection is uncommon enough to evade most security scanners. McKinsey’s rapid acknowledgment and patching within a day of disclosure on March 1 adds credibility. The prompt-layer risk (writable system prompts in a shared database) is a genuine architectural concern that many organizations haven’t modeled.

The questionable parts: CodeWall provides no proof-of-concept payloads, hashes, or screenshots to verify the scope of access. The blog post conflates what was theoretically reachable in the database versus what was actually retrieved. The two-hour timeline for blind SQL injection seems compressed, given the typical back-and-forth required. And CodeWall’s claim that modified prompts “leave no log trail” contradicts the reality that most mature organizations maintain database audit logging.

Disclosure concerns: The nine-day window between discovery (late February) and public disclosure (March 9) gave McKinsey limited time for forensic review. McKinsey’s own statement notes that an external forensic investigation “found no evidence that client data or confidential client information was accessed by the researcher or any other unauthorized third party.”

Whether CodeWall’s specific numbers hold up to scrutiny or not, the core finding — unauthenticated endpoints plus SQL injection equals full database access — is a serious enough vulnerability on its own.

CodeWall vs. the AI Security Field

CodeWall isn’t operating in a vacuum. The AI red-teaming space has gotten crowded:

Mindgard, founded in 2022 at Lancaster University, offers autonomous red teaming focused specifically on AI model security — testing for adversarial attacks, prompt injection, and model extraction.
Lakera focuses on the defensive side, providing guardrails against prompt injection, data leakage, and jailbreaks for Fortune 500 companies.
Gray Swan combines frontier research with a large red-teaming network to test AI system resilience.

Where CodeWall differentiates is in its fully autonomous, offensive approach — the agent doesn’t just test AI-specific vulnerabilities like prompt injection, but probes the entire infrastructure stack (APIs, databases, authentication) the way a real attacker would. The McKinsey hack demonstrates this: the vulnerability wasn’t in the AI model itself, but in the web infrastructure around it.

That said, the autonomous targeting claim raises its own questions. If an AI agent can independently decide to attack a specific company’s infrastructure, the line between authorized security testing and unauthorized access gets uncomfortably thin. The security community on Hacker News has debated this point extensively.

What This Means for Enterprise AI Deployments

The McKinsey-Lilli incident highlights a pattern that’s becoming disturbingly common: companies rush to deploy internal AI tools, bolting them onto existing infrastructure without applying the same security rigor they’d give to a customer-facing product.

Key takeaways for any organization running internal AI platforms:

Treat internal tools like external ones. Lilli wasn’t customer-facing, but it had publicly exposed API docs and unauthenticated endpoints. Internal doesn’t mean safe.
AI system prompts are critical assets. Storing them in a writable database alongside user data is an architectural flaw. Prompts should be versioned, read-only in production, and monitored for changes.
Old vulnerabilities don’t die. SQL injection has been known for over 20 years. The JSON key variant that hit McKinsey is uncommon but not new. Security teams need to test for edge cases, not just the obvious patterns.
Scheduled pentests aren’t enough. If an AI agent can find and exploit a vulnerability in two hours, quarterly penetration tests leave massive windows of exposure.

FAQ

What is CodeWall.ai?
CodeWall.ai is an autonomous offensive security startup that uses AI agents to continuously probe and attack customers’ infrastructure, identifying vulnerabilities before malicious actors can exploit them. The company made headlines in March 2026 after its agent breached McKinsey’s internal AI platform Lilli.

Is CodeWall.ai free to use?
CodeWall appears to operate as an enterprise security service rather than a free tool. Specific pricing details have not been publicly disclosed. Companies interested in their autonomous red-teaming capabilities would need to contact them directly.

How does CodeWall compare to traditional penetration testing?
Traditional pentesting is typically done by human teams on a scheduled basis (quarterly or annually) and can take weeks. CodeWall’s AI agent operates autonomously and continuously, completing assessments at machine speed. The trade-off is that autonomous agents may lack the contextual judgment of experienced human testers, and questions remain about the depth and accuracy of fully automated findings.

Did McKinsey’s client data actually get stolen?
According to McKinsey’s statement, an external forensic investigation found no evidence that client data or confidential information was accessed by CodeWall’s researchers or any unauthorized third party. CodeWall performed this as a red-team exercise and disclosed the vulnerability through responsible disclosure channels on March 1, 2026. McKinsey patched all identified vulnerabilities within a day.

What other tools compete with CodeWall in AI security?
The AI security space includes Mindgard (autonomous AI red teaming), Lakera (AI guardrails and prompt injection defense), Gray Swan (AI security research platform), and open-source tools like Promptfoo. CodeWall differentiates by testing full infrastructure stacks rather than focusing solely on AI-specific attack vectors like prompt injection.

Top AI Product