AI Models & APIs
-
Alibaba’s Qwen3.7-Plus adds vision and agentic tool use, scoring 79 on ScreenSpot Pro
Qwen3.7-Plus is Alibaba’s entry into the multimodal-agent race. Where Qwen3.7-Max stayed a pure-text flagship, Plus takes images and video as input — understanding them, not generating — and wires that perception into an agent loop that plans, calls tools, writes and tests its own code, then iterates on real execution feedback. ## What it’s built… Continue reading
-
Microsoft MAI-Image-2.5 debuts at No. 3 on Arena with built-in image editing
## What it is MAI-Image-2.5 is Microsoft’s new in-house image generation and editing model, shown at Build 2026 and available in Foundry. It debuted at No. 3 on Arena.ai’s image leaderboard — a +75-point jump over MAI-Image-2 — with its biggest gains in text rendering (+107) and cartoon, anime, and fantasy (+90). It’s already running… Continue reading
-
MisoTTS is an 8B open-weights voice model built to out-emote humans
## What it is MisoTTS is an 8-billion-parameter text-to-speech model from Miso Labs, released with open weights and a claim of being the most emotive voice model around. It generates expressive speech from text plus audio context, using residual vector quantization to widen its sonic range, and it clones a voice from a short sample… Continue reading
-
Apertis gives you one API key for 470+ AI models across 30 providers
## What it is Apertis is a unified gateway that puts 470+ models from 30+ providers — OpenAI, Anthropic, Google, Meta, and more — behind a single OpenAI-compatible API key. Point your existing OpenAI or Anthropic client at it and you can call any model without rewiring code. It’s built for coding tools too: Claude… Continue reading
-
NVIDIA Cosmos 3 is an open world model that generates text, video, sound, and actions
## What it is NVIDIA just released Cosmos 3, an open foundation model for Physical AI that natively understands and generates across text, images, video, ambient sound, and actions — all in one model. It’s built on a two-tower Mixture-of-Transformers: an autoregressive transformer handles physical reasoning while a diffusion transformer handles multimodal generation. The point… Continue reading
-
A Netflix engineer’s Headroom cuts LLM token bills up to 95% — and it’s open source
A Netflix senior engineer just open-sourced the tool you wish you’d written. Headroom (LLM context compression) jumped 1,000+ GitHub stars in a single day, and the pitch is brutally simple: most of the tokens you’re paying for are junk. What it actually does Headroom sits as a transparent proxy between your app and any of… Continue reading
-
—
title: “Microsoft MAI-Thinking-1 outscores Claude Sonnet 4.6 in blind evals — trained without OpenAI’s data” date: 2026-06-03 tags: [model, microsoft, reasoning, api] Microsoft shipped its first fully self-built reasoning model at Build 2026, and the signal is hard to miss. MAI-Thinking-1 hits 97% on AIME 2025 and 94.5% on AIME 2026. In blind human evaluations… Continue reading
-
Microsoft MAI-Code-1 Flash is live in GitHub Copilot — 60% fewer tokens than comparable coding models
Microsoft shipped its first self-trained coding model at Build 2026. MAI-Code-1 Flash is a 5B-parameter model trained entirely inside GitHub Copilot’s own production tool harness — real developer workflows, not synthetic benchmarks. That’s the unusual part: the training environment is the deployment environment. The numbers hold up. 85.8% on Microsoft’s adversarial coding benchmark, ~51% on… Continue reading
-
WorldKV lets world models remember what they have seen — training-free KV retrieval and compression for spatial consistency
WorldKV, from KAIST AI and Naver AI Lab, tackles a core problem in world models: when you revisit a place you’ve already seen, the model should show you the same thing. Sustaining that persistent consistency has been hard — full attention preserves it but blows the real-time budget; sliding-window inference is fast but forgets. ##… Continue reading
-
Cohere Command A+ is its first fully Apache 2.0 model — 218B MoE with native citations, runs on 2 H100s
Cohere released Command A+ — a 218B-parameter sparse MoE model (25B active) under full Apache 2.0, the company’s first fully open-weight model. It’s tuned for complex reasoning, multimodal document processing, and agentic workflows, and it runs on a single NVIDIA B200 or just two H100s. ## Native citations The standout feature: when Command A+ retrieves… Continue reading
