AI Models & APIs
-
xAI Grok 4.3 cuts pricing 40% and tops legal reasoning at 79.3%
xAI shipped Grok 4.3 on May 6, and the move that matters is the price cut. $1.25 per 1M input tokens, $2.50 per 1M output — roughly 40% below Grok 4.20 — for a model that just hit 53 on the Artificial Analysis Intelligence Index, the highest in its price tier. The numbers behind the… Continue reading
-
Khosla-backed Genesis AI ships GENE-26.5 — one model that cracks eggs, pipettes, and plays piano
Khosla-backed Genesis AI just put out GENE-26.5, a foundation model for robotic manipulation — paired with their own human-scale dexterous hand and a tactile data glove. The launch demo runs a 20-step meal prep including one-handed egg cracking. Another shows the same model playing piano. All driven by a single autonomous brain, no task-specific fine-tunes.… Continue reading
-
DFlash beats EAGLE-3 by 2.5x using block diffusion as the speculative draft model
Z-Lab (Chen, Liang, Liu) shipped DFlash this week. 3.6k GitHub stars, +671 in a single day. It’s an inference speedup layer for any LLM, and the trick is genuinely new. What’s actually different Speculative decoding has been around for a while: a small draft model guesses N tokens, the big model verifies them in one… Continue reading
-
Title: ds4 (DeepSeek-V4 Metal local inference engine by antirez): Redis creator runs V4 Flash on a single MacBook
Salvatore Sanfilippo — the Redis guy — dropped ds4.c, a native inference engine for DeepSeek V4 Flash written as one C file with zero external dependencies. The whole thing is a Metal graph executor wired to DS4’s MoE topology: custom loader, prompt rendering, KV state, server glue. No GGUF wrapper, no llama.cpp fork on the… Continue reading
-
Claude Opus 4.7 goes Wall Street first — 64.4% on Vals Finance, 1M context in Claude Code
Anthropic didn’t drop Claude Opus 4.7 with a blog post. The new flagship model went to a closed-door bank briefing in New York on May 5, alongside a Moody’s data pipe and full Microsoft 365 integration. The message: this model is built for whoever pays the most. The numbers back it up. 64.4% on Vals… Continue reading
-
Kronos hits 23k stars: a TimesFM-style foundation model trained on K-lines from 45 exchanges
Quants got their own foundation model. Kronos takes the playbook Google used for TimesFM and ports it to financial OHLCV data — feed it candlesticks, get back multi-period forecasts. AAAI 2026 accepted, 23k GitHub stars, still climbing today. How it works Two stages. A tokenizer first quantizes continuous multi-dim K-line data (open, high, low, close,… Continue reading
-
OpenAI swaps GPT-5.5 Instant in as ChatGPT’s default — hallucinations drop 52% on legal and medical prompts
OpenAI replaced GPT-5.3 Instant with GPT-5.5 Instant as ChatGPT’s default model on May 5. Hundreds of millions of daily users got a new brain overnight, no opt-in, no banner. What actually changed This is a model swap, not a product launch. AIME 2025 math reasoning jumped from 65.4 to 81.2. MMMU-Pro multimodal went 69.2 to… Continue reading
-
Google ships Gemma 4 multi-token prediction drafters: 2.7-3.5x faster inference, free
What it is Tiny helper models that ride alongside Gemma 4 and guess 4-8 tokens ahead per forward pass. The main model just verifies. Right guess, you get the whole sequence in one pass. Wrong guess, fall back to normal. No quality loss because the big model still signs off on every token. Same speculative-decoding… Continue reading
-
Gemini 3.1 Ultra goes live with Google’s largest context window yet
Google just slotted Gemini 3.1 Ultra above 3.1 Pro and 3.1 Flash. It’s the new top of the family, built for the tasks where Pro starts to crack — multi-step agents, hairy long-document reasoning, deep research runs that chew through hundreds of sources before answering. What makes Ultra different It’s not a fresh architecture. It’s… Continue reading
-
Grok Imagine 1 on DesignArena across all three video arenas, beating Sora 2 Pro and Veo 3.1
xAI’s Grok Imagine took the top spot on every DesignArena video board — Video Arena (Elo 1337), Video Editing Arena (1291), and Image-to-Video Arena (1298, confirmed at 1329 in the latest run). It beat Runway Gen-4.5, Sora 2 Pro, and Google Veo 3.1 on the same leaderboard run by Arcada Labs. During the 30-day pre-launch… Continue reading
