Most video AI gets graded on shallow recall — what’s on screen at minute three. VideoKR, a new arXiv release, targets the harder thing: video questions that need outside knowledge and multi-step reasoning, not a textual shortcut. It’s billed as the first large-scale training corpus built specifically for that.
## What’s in it
The dataset is 315K video reasoning examples over 145K newly collected, CC-licensed, expert-domain videos. They come from a human-in-the-loop, skill-oriented pipeline designed to push progressively deeper reasoning while keeping difficulty, diversity, and the chain-of-thought rationales reliable. The CC licensing matters — it means the corpus is actually usable, not a legal landmine.
## Why it matters
The team also shipped VideoKR-Eval, an expert-annotated benchmark where you can’t fake the answer from the transcript — you have to actually understand the video. Models post-trained on VideoKR beat prior approaches on knowledge-heavy video reasoning while staying competitive on general tasks. The point the authors push: data design, not just bigger models, is what moves video reasoning forward.

Leave a comment