AI Models & APIs
-
NVIDIA debuts Nemotron 3 open models — Nano delivers 4x the throughput of Nemotron 2 for multi-agent systems
NVIDIA debuted the Nemotron 3 family of open models — Nano, Super, and Ultra — positioned as the most efficient open models for building agentic AI applications. The headline: Nemotron 3 Nano delivers 4x higher throughput than Nemotron 2 Nano, and the most tokens per second for multi-agent systems at scale. ## The architecture Nano’s… Continue reading
-
xAI ships Grok Build, a terminal coding agent that spawns up to 8 concurrent sub-agents on Grok 4.3
xAI entered the coding-agent race with Grok Build — a terminal-based agent CLI taking direct aim at Claude Code and OpenAI Codex. Currently early beta, available to SuperGrok Heavy subscribers at $300/month. ## What it does Runs from the terminal, driven by natural-language prompts. It generates implementation plans, edits files, executes shell commands, manages dependencies,… Continue reading
-
RTPurbo turns a full-attention LLM sparse in a few hundred training steps — 9.36x prefill speedup at 1M context
“Full Attention Strikes Back” introduces RTPurbo, a method that converts a standard full-attention LLM into a sparse-attention one with only a few hundred training steps — near-lossless accuracy, big efficiency gains. ## The numbers Up to 9.36x prefill speedup at 1M-token context, and about 2.01x decode speedup. The trick: keep the full KV cache only… Continue reading
-
DelTA reweights RL training so formatting tokens stop drowning out the signal that matters
DelTA is a new method for reinforcement learning from verifiable rewards (RLVR) — the training technique behind most of today’s reasoning models. The insight is sharp: the policy-gradient update in RLVR implicitly acts as a linear discriminator over token-gradient vectors, deciding which token probabilities go up or down. ## The problem it fixes That discriminator… Continue reading
-
Gated DeltaNet-2 decouples erase and write in linear attention — beats Mamba-3 and KDA at 1.3B
Gated DeltaNet-2, from the NVIDIA and MIT team behind the original, fixes a subtle flaw in how linear-attention models manage memory. Prior delta-rule models (Gated DeltaNet, KDA) used a single scalar gate to do two jobs at once — erasing old content and writing new content. v2 decouples them, and the gains show up exactly… Continue reading
-
Qwen3.7-Max ran 35 hours and called 1,000+ tools to write a kernel 10x faster than the vendor code
Alibaba released Qwen3.7-Max on May 19, unveiling it at the 2026 Alibaba Cloud Summit. It’s a reasoning model engineered for long, multi-stage agentic projects rather than short chat — and the headline demo backs that up: it ran 35 hours uninterrupted, called over 1,000 different tools, and wrote an optimized compute kernel that ran 10x… Continue reading
-
MiniMax M2.7 Highspeed hits 100 tokens/sec — matches Opus 4.6 on coding benchmarks at a fraction of the cost
MiniMax released the Highspeed variant of its M2.7 coding model on May 18 — a latency-tuned version delivering roughly 100 tokens per second versus 60 for standard M2.7, with identical output behavior. It matches or approaches Claude Opus 4.6 and GPT-5 on the hardest coding and agentic benchmarks while running 3x faster and costing a… Continue reading
-
Gemini Omni: Google ships a multimodal video model that takes image, audio, video, and text as input
Google announced Gemini Omni at I/O 2026 — a new model series that combines Gemini’s reasoning capabilities with native video generation. The first release, Gemini Omni Flash, accepts image, audio, video, and text input and outputs video grounded in real-world knowledge that can be easily edited. ## What’s actually new Most video generation models today… Continue reading
-
Google ships Gemini 3.5 Flash at I/O 2026: 4x faster than 3.1 Pro and tuned agentic-first
Google opened Google I/O 2026 yesterday with Gemini 3.5 Flash — a frontier model that combines reasoning with agentic task execution. The headline: 4x faster output tokens per second than other frontier models, while beating Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. Gemini 3.5 Pro is in internal testing now, with public availability… Continue reading
-
Cursor Composer 2.5 matches Opus 4.7 on SWE-Bench at 1/10th the cost — Kimi K2.5 base with 85% Cursor RL
Cursor shipped Composer 2.5 on May 18 — an in-house coding agent built on the open-source Kimi K2.5 checkpoint from Moonshot AI, then heavily post-trained by Cursor (roughly 85% of total compute budget went into Cursor’s own reinforcement learning and post-training pipeline). The headline: 79.8% on SWE-Bench Multilingual, matching Claude Opus 4.7 and GPT-5.5 at… Continue reading
