What Is It
Arcee AI, a 26-person US startup, shipped Trinity-Large-Thinking — a 399B-parameter open-source reasoning model under Apache 2.0. Built in 33 days on 2,048 NVIDIA Blackwell GPUs for just $20 million.
It runs on a Mixture-of-Experts architecture: 399B total parameters, only 13B active per token. That means 2-3x faster inference than dense models at comparable capability, with a 262K context window built for long-horizon agent workflows.
Why It’s Blowing Up
The price gap is absurd. $0.90 per million output tokens. Claude Opus 4.6 charges $25. That’s 96% cheaper.
Arcee’s benchmarks: 91.9% on PinchBench (just behind Opus), 98.2% on LiveCodeBench, matching Kimi K2 on AIME25. It does trail Opus by 12+ points on SWE-bench — not a full replacement, but hard to argue with the economics.
CEO Mark McQuade calls it “the most capable open-weight model ever released by a non-Chinese company.” With DeepSeek and Qwen dominating open-source, Western enterprises finally have an Apache 2.0 option they can self-host without geopolitical baggage. TechCrunch ran a full feature on April 7. Independent benchmarks still pending — take the numbers with caution.
You Might Also Like
- 708 Github Stars in 48 Hours Claude Token Efficient Universal Claude md and the Fight Over Claudes Most Expensive Habit
- Pi Mono 29k Stars and a 200 Token System Prompt That Rivals Claude Code
- Qwen 3 6 Plus vs Claude Opus 4 6 3x the Speed 1 17th the Price and the Benchmarks are Uncomfortably Close
- Claude Code Security Just Dropped and it Already Found 500 Zero Days Nobody Knew About
- Claude Code Remote Control Just Turned my Phone Into a Coding Terminal and im Weirdly Into it

Leave a comment