AI Models & APIs
-
DeepSeek V4 Pro hits GPT-5 parity on 5 of 7 benchmarks — at a fraction of the cost
DeepSeek V4 Pro just got benchmarked by NIST’s CAISI against GPT-5. The verdict: roughly the same intelligence, about 8 months behind frontier closed models, but cheaper than GPT-5.4 mini on 5 of 7 tested benchmarks — anywhere from 53% less expensive to 41% more. That’s the whole story. China’s open-weight champ is again forcing closed-source… Continue reading
-
GPT-5.5 takes back the coding crown from Claude Opus 4.7
OpenAI shipped GPT-5.5 on April 23, 2026, with a Pro variant a day later. Biggest model bump in six weeks, and the benchmarks aren’t subtle: 84.9% on GDPval, 78.7% on OSWorld-Verified, 98.0% on Tau2-bench Telecom (no prompt tuning), 82.7% on Terminal-Bench 2.0, 51.7% on FrontierMath 1-3. Translation: best-in-class coding, computer use, and agent workflows —… Continue reading
-
OpenAI Trusted Access for Cyber opens GPT-5.5 to offensive security work — for verified defenders only
OpenAI is splitting its safety stack. Trusted Access for Cyber is a verified-user tier that unlocks GPT-5.5’s offensive security capabilities — vulnerability research, exploit chain reasoning, red-team payload work — for vetted defenders. Codex is the first surface to ship it. First time a frontier lab has formalized a cyber-permissive track. Vetted users get a… Continue reading
-
OpenAI on AWS Bedrock ends Microsoft’s seven-year Azure lock
April 28. One day after Microsoft’s exclusivity expired, GPT-5.5 went live on Amazon Bedrock. Seven years of Azure-only is over. What’s actually shipping GPT-5.5, GPT-5.5 Pro, and GPT-5.4 are callable through Bedrock APIs in limited preview. Same IAM, PrivateLink, CloudTrail, and guardrails AWS shops already use for Claude and Llama. Pricing matches OpenAI’s published rates:… Continue reading
-
Kimi K2.6 beats GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro
Moonshot AI open-sourced Kimi K2.6 weights on Hugging Face on April 20. It scored 58.6 on SWE-Bench Pro — ahead of GPT-5.4 (xhigh) at 57.7, Claude Opus 4.6 (max effort) at 53.4, and Gemini 3.1 Pro (thinking high) at 54.2. First open-source coder to clear the closed flagships on that bench. What the model actually… Continue reading
-
Google rolls out File Generation in Gemini, chasing ChatGPT Canvas and Claude Artifacts
Google shipped File Generation in Gemini on April 29. Tell the chatbox “turn this into a PDF” or “export to Excel,” and you get a real downloadable file. No copy-paste, no manual formatting, no opening another app. What it generates Workspace files (Docs, Sheets, Slides) plus PDF, DOCX, XLSX, CSV, LaTeX, TXT, RTF, and Markdown.… Continue reading
-
Mistral Medium 3.5 scores 77.6% on SWE-Bench — one point shy of Gemini 3.1 Pro
Mistral just shipped Medium 3.5, a 128B dense model that hits 77.6% on SWE-Bench Verified. For context: Gemini 3.1 Pro Preview leads the board at 78.8%. An open-weight model from a European lab is now within rounding distance of Google’s flagship on real coding work. It’s a single set of weights doing instruction-following, reasoning, and… Continue reading
