AI Models & APIs
-
Thinking Machines Interaction Models: Mira Murati attacks the turn-based LLM
Thinking Machines Lab dropped their first real architecture statement on May 11, and it isn’t another scaling paper. Mira Murati’s team argues that humans got pushed out of AI collaboration not because models don’t need us, but because the interface never left us a seat. The 200ms idea Today’s LLMs work in turns. You type,… Continue reading
-
Claude Platform on AWS goes GA: Anthropic runs it, you just pay through AWS
On May 11, Anthropic and AWS made it official — the full native Claude Platform now ships through your existing AWS account in 18 regions on day one. This isn’t Bedrock with a new skin. It’s a different deal entirely. Not Bedrock — and that’s the whole point Bedrock runs Claude inside the AWS data… Continue reading
-
Interfaze hits 83.6% on MMLU-Pro with a hybrid DNN+LLM stack
The Interfaze paper just hit HN front page at 86 points and got accepted at IEEE CAI 2026. The contrarian bet: monolithic transformers are the wrong shape for high-accuracy work. Strip them apart and route tasks to specialized models first. What Interfaze actually is Three layers stitched together. Specialized DNN/CNN modules handle perception — OCR… Continue reading
-
Tencent Hunyuan Hy3 Preview goes open-source: 295B MoE, 21B active, 256K context
Tencent rebuilt its Hunyuan training infra in February. Three months later they shipped Hy3 Preview — 295B total parameters, 21B active, 256K context — and dropped the weights on Hugging Face, ModelScope and GitCode on April 23. What it actually is A frontier-scale Mixture-of-Experts model. Not a chat product, not an agent — raw weights.… Continue reading
-
Google ships Gemini API File Search Multimodal RAG — page-level citations and Embedding 2 baked in
Google extended its Gemini API File Search tool with the three things every production RAG team has been begging for: native multimodal indexing, custom metadata filtering at query time, and page-level citations. PDFs, scientific imagery, and plain text now live in one searchable index, powered by Gemini Embedding 2. The post hit the Hacker News… Continue reading
-
Gemini 3.1 Flash-Lite hits GA: $0.25/M input tokens, 2.5x faster TTFT
Google pushed Gemini 3.1 Flash-Lite to General Availability on May 7. It’s the cheapest, fastest model in the Gemini 3 family — and the most interesting one for anyone running real production traffic. What it is A lightweight LLM API, not a consumer product. Pricing is $0.25 per million input tokens and $1.50 per million… Continue reading
-
GPT-5.5-Cyber: OpenAI forks a security model with looser guardrails for vetted red teams
OpenAI shipped GPT-5.5-Cyber on May 7, 2026 — a fork of GPT-5.5 with the cybersecurity guardrails dialed back. Vetted defenders can have it write proof-of-concept exploits, run attack simulations, and validate vulnerabilities — work that gets a polite refusal in standard ChatGPT. How the split works Two tracks, one model family. Standard GPT-5.5 stays the… Continue reading
-
Subquadratic SubQ claims 1,000x less compute at 12M tokens — researchers want receipts
A Miami startup nobody had heard of two weeks ago shipped what it calls the first fully sub-quadratic commercial LLM. SubQ runs a native 12-million-token context window. The numbers Subquadratic put on the page: 50x faster and 50x cheaper than frontier models at 1M tokens, and roughly 1,000x less compute at the full 12M window.… Continue reading
-
Anthropic × Akamai $1.8B compute deal bets on edge inference for Claude
Anthropic just locked in $1.8 billion of inference capacity with Akamai over multiple years. One day earlier, they rented the entirety of Colossus 1 from xAI. Two massive compute deals in 48 hours. What the deal actually is This isn’t training compute. It’s pure inference — running Claude for paying users. Akamai brings something the… Continue reading
-
OpenAI GPT-Realtime-2 + Translate + Whisper: three voice models, one API, several startups erased
OpenAI shipped three Realtime API models on May 7. Read the spec sheet and you can hear a half-dozen voice startups quietly rewriting their decks. What actually launched GPT-Realtime-2 is the first voice model with GPT-5-class reasoning baked in. 128K context (up from 32K), five-level reasoning effort, tone control, parallel tool calls, clean recovery from… Continue reading
