agent
-
Anthropic Just Launched Code Review in Claude Code — And 54% of PRs Now Get Real Feedback
AI is writing more code than ever. Anthropic’s own engineers saw their code output jump roughly 200% year over year. But here’s the catch: someone still has to review all that code. Human reviewers can’t scale at the same rate, and the result is a growing gap between code produced and code properly checked. That’s… Continue reading
-
OpenAI Just Acquired Promptfoo — The $86M AI Security Startup Used by 25% of Fortune 500
OpenAI dropped a big announcement on March 9, 2026: it’s acquiring Promptfoo, the open-source AI red-teaming platform that’s become the go-to security testing tool for enterprise AI deployments. The deal marks OpenAI’s clearest signal yet that AI agent safety isn’t just a research priority — it’s a product one. What Happened OpenAI confirmed plans to… Continue reading
-
mcp2cli: The Tool That Cuts MCP Token Costs by 99% Just Hit Hacker News
If you’ve been building with MCP (Model Context Protocol) servers, you already know the pain: every tool schema gets injected into your LLM’s context on every single turn, whether the model uses those tools or not. With 30 tools, that’s roughly 3,600 tokens burned per turn doing absolutely nothing. Scale that to 120 tools over… Continue reading
-
Phi-4-reasoning-vision-15B: Microsoft’s 15B Model Just Embarrassed GPT-4o on Vision Tasks
If you’ve been paying attention to AI Twitter or the [Hacker News](https://news.ycombinator.com/) front page this past week, you’ve probably seen people losing their minds over Microsoft’s latest release. [Phi-4-reasoning-vision-15B](https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B) dropped on March 4th, and it’s one of those models that makes you rethink everything you assumed about model size and capability. Here’s the deal: this… Continue reading
-
Grammarly AI Expert Review: Getting Feedback From Dead Scholars They Never Agreed To Give
So here’s a wild one. Grammarly — which now operates under the Superhuman brand after a rebrand in late 2025 — rolled out a feature called “Expert Review” that lets you pick a real-world scholar or writer to “review” your manuscript. Sounds cool in theory, right? Except they forgot one tiny detail: actually asking those… Continue reading
-
SWE-CI Exposes What AI Coding Agents Still Can’t Do
There’s been a lot of chest-thumping lately about AI coding agents solving real-world GitHub issues. SWE-bench scores keep climbing, and every new model launch comes with claims about “state-of-the-art” issue resolution rates. But here’s the thing — fixing a single bug in isolation is very different from maintaining a codebase over months. [SWE-CI](https://arxiv.org/abs/2603.03823) is a… Continue reading
-
Agent Safehouse: Finally, a Dead-Simple Way to Stop AI Agents From Roaming Your Mac
If you’ve been letting Claude Code, Codex, or Aider run loose on your machine, you’ve probably had that moment — the one where you realize your coding agent has full access to your SSH keys, your `.env` files, and every repo on your system. It’s a weird feeling, like handing your house keys to a… Continue reading
-
P0 (bepurple.ai) — Finally, an AI Coding Agent That Actually Ships Whole Features
There’s a growing gap between what AI coding tools promise and what they actually deliver. Most of them are great at autocompleting a function or generating a single file, but ask them to build a real, multi-file feature across a codebase and things fall apart fast. That’s exactly the problem [P0 by Purple AI](https://www.bepurple.ai/) is… Continue reading
-
Your Anonymous Posts Aren’t Anonymous Anymore — Inside the LLM 大规模去匿名化研究
So here’s something that should make you uncomfortable: a group of researchers just proved that LLMs can figure out who you are from your “anonymous” online posts, and they can do it at scale for about four bucks per person. The paper, [“Large-scale online deanonymization with LLMs”](https://arxiv.org/abs/2602.16800), comes from [MATS Research](https://www.matsprogram.org/research/large-scale-online-deanonymization-with-llms) — authored by Simon… Continue reading
-
Claude Marketplace: Anthropic’s Bold Zero-Commission Play for Enterprise AI
Anthropic just did something interesting — and honestly, kind of unexpected. They launched [Claude Marketplace](https://claude.com/platform/marketplace), an enterprise-focused storefront where companies can buy third-party software built on Claude. Think of it as Anthropic’s answer to AWS Marketplace or Azure Marketplace, but with one big twist: they’re not taking a cut. Yeah, you read that right. Zero… Continue reading
