Data-Free Transition-Spectrum Winsorization for Mamba Long-Context Generalization
Abstract
State space models like Mamba offer linear-time sequence modeling but struggle with long-context generalization due to extreme eigenvalues in the transition spectrum causing state explosion. Existing solutions either require calibration data or apply uniform modifications that degrade short-context performance. We propose data-free transition-spectrum winsorization, which clips extreme eigenvalues in each layer's spectrum to a percentile-based range without requiring any calibration data. On PG-19 language modeling with Mamba2-1.3B, our method achieves PPL@64K of 11.44, outperforming constant scaling (13.19) by 13% while modifying only 17.5% of channels compared to 100% for scaling methods. Mechanism analysis reveals that extreme effective eigenvalues are driven by input-dependent dynamics rather than static outliers, motivating future work on input-dependent interventions.