A Miami startup nobody had heard of two weeks ago shipped what it calls the first fully sub-quadratic commercial LLM. SubQ runs a native 12-million-token context window. The numbers Subquadratic put on the page: 50x faster and 50x cheaper than frontier models at 1M tokens, and roughly 1,000x less compute at the full 12M window. The company claims SubQ beats GPT-5.5 on long-context retrieval. VentureBeat reports independent researchers want third-party benchmarks before they buy any of it.
What’s actually new
Standard transformers scale O(n²) with context — double the tokens, quadruple the compute. SubQ’s architecture, Subquadratic Selective Attention (SSA), scales linearly in compute and memory with context length. That’s the entire pitch. If it holds up under independent testing, the attention bottleneck that’s been the hard ceiling since 2017 just got rewritten.
Two products at launch
SubQ API exposes the 12M context window to developers and enterprise teams. SubQ Code is a CLI coding agent built around the obvious use case — load an entire codebase into a single context window and skip the RAG plumbing. A 50M-token version is on the Q4 roadmap.
The team: CEO Justin Dangel, CTO Alex Whedon (ex-Head of Generative AI at Meta). $29M seed, Justin Mateen and Javier Villamizar in the cap table.
You Might Also Like
- Dreamer Raised 56m to Build an Agent os 5 Weeks After Launch Meta Hired the Entire Team
- Code Arena Finally Gives Developers a Fair way to Judge ai Coding Models
- Karpathy Built a Full Chatgpt Clone in 8000 Lines of Code Nanochat Hits 47k Stars
- Unsloth Studio Brings no Code llm Fine Tuning to Your Laptop 2x Faster 70 Less Vram
- Stripe Coinbase and Ramp Built Internal Coding Agents Langchain Open swe Gives you the Same Architecture for Free

Leave a comment