A team from Zhejiang University and Alibaba turned LoRA into something it usually isn’t — a measurement instrument — and used it to pin down how much a model can actually memorize through low-rank fine-tuning. The headline is a Parametric Memory Law: a power-law relationship between loss reduction, effective parameters, and sequence length.
## A power law and a phase transition
Treating LoRA as a controlled probe of memory capacity, the paper quantifies exact parametric memory rather than relying on downstream task scores as a proxy. Zooming in to the token level reveals a deterministic phase transition: when prediction probability crosses p = 0.5 under greedy decoding, that token is going to be recalled verbatim. Below that threshold, recall is unreliable; above it, it’s a near-certainty.
## MemFT: budget gradients where they matter
The practical follow-on is MemFT, a fine-tuning method that reallocates the gradient budget away from tokens the model has already mastered and toward tokens still below the deterministic-recall threshold. Under tight LoRA capacity, that targeted allocation lifts effective memory capacity without growing the adapter.
## Why it matters
LoRA’s behavior has been understood mostly through “what works.” A quantitative law of how much it can remember — plus a crisp threshold for verbatim recall — turns intuition into something engineers can plan against. For anyone tuning small LoRAs to hold specific knowledge, knowing exactly when more capacity matters is the difference between cheaply done and quietly broken.

Leave a comment