Alibaba announced Qwen 3.7-Max at the Apsara Cloud Summit, with API rollout already started. It’s positioned as Alibaba’s most capable agent model so far — built for long-running, multi-step workflows rather than single-prompt question answering — and ships with a 1M-token context window and a native extended-thinking mode.
## The 35-hour demo
The standout number is the internal demonstration: Qwen 3.7-Max ran continuously for about 35 hours, chained 1,158 tool calls, and reported a roughly 10x geometric mean speedup over a reference Triton kernel on Alibaba’s Zhenwu M890 AI accelerator. No independent reproduction has been published — but the scale of the demo signals what the model is aimed at: agents that don’t reset after a few minutes.
## Benchmarks and pricing
Reported scores: 60.6 on SWE-Pro, 69.7 on Terminal-Bench 2.0, 92.4 on GPQA Diamond. Pricing on Alibaba Cloud DashScope is $2.50 per million input tokens and $7.50 per million output tokens. Both Qwen 3.7-Max and the companion Qwen 3.7-Plus are closed-weights, API-only.
## Why it matters
Most “agent model” launches really mean “another chat model with tool calls bolted on.” A model trained and pitched for 35-hour autonomous runs with thousands of tool calls is something else — and if the demo holds up under independent testing, it shifts the bar for how long-horizon a single inference budget can sustain useful agentic work.

Leave a comment