Most OCR pipelines still slice a document page by page before a model ever sees it. Unlimited-OCR, an open 3B model Baidu put out June 22, skips that step — it parses a whole multi-page PDF or image stack in a single inference pass.
## One-shot long-horizon parsing
The pitch is in the name: instead of pre-cutting documents into pages, Unlimited-OCR ingests the full stack at once, supported by a 32,768-token context window. To make that affordable, it replaces the decoder’s attention layers with Reference Sliding Window Attention (R-SWA), which keeps the KV cache constant through decoding instead of letting it balloon with document length. It’s built on DeepSeek-OCR and fine-tuned with Hugging Face transformers, and ships two modes — a cropped 640px “gundam” and a full 1024px “base.”
## Why it matters
At 3B parameters under an MIT license, with weights on GitHub and ModelScope, it’s small and open enough to self-host for real document workloads. The one-pass approach matters most for long, messy documents — contracts, filings, scanned books — where page-by-page slicing loses cross-page structure that a single long-horizon pass keeps intact.

Leave a comment