Local-Time AdamW for Stability-Gap Reduction in Continual Learning
Abstract
Continual learning systems suffer from the stability gap---a transient drop in performance on previously learned tasks immediately after switching to a new task. While catastrophic forgetting has been extensively studied, the role of optimizer state in the stability gap remains underexplored. We identify that AdamW's bias correction, designed for cold-start training, becomes counterproductive at task boundaries where moment estimates are warm but misaligned with the new task's gradients. We propose Local-Time AdamW (LT-AdamW), which resets only the bias-correction timestep at task boundaries while preserving moment buffers. This produces natural update dampening when moment estimates are stale, reducing the effective learning rate by approximately in early post-switch steps. On Split CIFAR-100, LT-AdamW reduces the stability gap by 31% and improves minimum post-switch accuracy by 24%. On Rotated MNIST, it reduces the stability gap by 17%. Empirical analysis confirms that update dampening matches theoretical predictions, and a control experiment verifies that the benefit is specifically attributable to the bias-correction mechanism. LT-AdamW requires only a one-line code change and serves as a drop-in replacement for standard AdamW in continual learning pipelines.