HeadRollback: Post-Task Attention Head Rollback for Replay-Free Continual LoRA Fine-Tuning
Abstract
Continual fine-tuning of large language models with LoRA suffers from catastrophic forgetting, where performance on earlier tasks degrades as the model adapts to new ones. Existing mitigation strategies either modify training dynamics or require replay buffers, adding complexity to the fine-tuning pipeline. We propose HeadRollback, a simple post-task intervention that identifies attention heads that were unintentionally disrupted during training---heads that changed substantially but are not critical for the new task---and rolls back their LoRA B-matrix rows to the previous checkpoint. Our method combines three signals: disruption magnitude, historical importance, and new-task importance, to select heads worth preserving. On a 5-task text classification benchmark with Qwen3-0.6B-Base, HeadRollback improves Overall Performance by +1.98 percentage points and Backward Transfer by +1.70pp over Vanilla LoRA, with consistent improvements across task orders (7/9 win rate). HeadRollback is replay-free, requires minimal state (one scalar per head plus a single checkpoint), and operates entirely at task boundaries without modifying training.