LlamaIndex shipped LiteParse v2.0, a complete Rust rewrite of its open-source document parser that claims up to 100x faster parsing. It runs entirely on your machine — no cloud, no LLM, no API key — and is aimed at the unglamorous step every RAG and agent pipeline hits first: turning a PDF, DOCX, or scanned image into text with structure.
## Rust everywhere, including the browser
The library now ships as native Rust, Node, Python, and a custom WASM package, so it runs in the browser and on edge runtimes too. The speed gains are document-size dependent: small documents see a 5–100x speedup, and larger documents around 3x. The eye-catching number is the example of a 457-page, 100MB document parsed in 0.777 seconds.
## What it actually parses
LiteParse extracts text with spatial layout information and bounding boxes — the structure most RAG pipelines silently throw away — across 50+ document types including DOCX, XLSX, PPTX, and images, with built-in OCR. Spatial bounding boxes matter because retrieval quality often hinges on whether a chunk preserves table structure or paragraph order, and most LLM-based parsers lose that.
## Why it matters
Parsing has been the part of agent and RAG stacks people quietly hate: slow, brittle, locked to cloud services. A fast, local, multi-runtime parser maintained by LlamaIndex makes that step a commodity instead of a bottleneck — and lets it run inside browsers and at the edge where cloud parsing isn’t viable.

Leave a comment