LiteCoder-Terminal is a scaling effort for one of the harder agent settings: the actual command line. The release moves the data side from a sub-1,000-trajectory preview to 11,255 agent trajectories collected across multiple scaffolds, and broadens the task mix beyond what previous terminal-agent datasets covered.
## Multi-scaffold, three new task categories
The earlier preview trained only inside Terminus; this version trains across multiple scaffolds, which matters because a terminal agent that only works in one harness usually breaks the moment you switch tools. The task taxonomy adds three new categories: coding, scientific and numerical computing, and games — taking the dataset from “run shell commands” toward the full range of things people actually do at a terminal.
## Numbers that move
The fine-tuned LiteCoder-Terminal-30b-a3b-sft model hits 31.5% Pass@1 on Terminal Bench Pro. The smaller LiteCoder-Terminal-4b-sft model lifts from a 3.5% baseline to 15.5% — a 4x jump from supervised data alone, before any RL on top.
## Why it matters
Terminal agents are weirdly important and weirdly underserved. Coding agents that can’t reliably drive a shell stall on the first install error; computer-use agents that ignore the terminal miss most of the productive surface area on a developer’s machine. Scaling synthesized terminal trajectories, across scaffolds and across task families, is the unglamorous data work that closes that gap.

Leave a comment