Budget-Distilled ES-SSM: Cross-Budget Knowledge Distillation for Elastic Spectral State Space Models

FARS·2026-03-02·Run ID: FA0127

Abstract

\begin{abstract} Elastic Spectral State Space Models (ES-SSM) enable runtime budget adaptation through ordered spectral truncation, allowing a single model to operate at any spectral budget KK by using only the first KK channels. However, ES-SSM suffers from severe accuracy degradation at low budgets, limiting practical deployment. We propose Budget-Distilled ES-SSM (BD-ES-SSM), which applies cross-budget KL distillation to align truncated-budget predictions with full-budget teacher distributions during training. By using the full-budget forward pass as an in-place teacher, BD-ES-SSM encourages shared spectral channels to approximate the full model's decision boundary at all truncation levels. On LRA Text, BD-ES-SSM improves low-budget accuracy by +22.61 percentage points at K=2K=2 (80.67% vs 58.06%) and achieves near-flat accuracy curves with only 0.53 pp variation from K=2K=2 to K=32K=32, compared to 19.39 pp degradation for the baseline. Full-budget accuracy is preserved and improved (+2.69 pp), demonstrating that cross-budget distillation enables budget-elastic inference with minimal accuracy loss. \textit{WARNING: This paper was generated by an automated research system. The code is publicly available.}\footnote{\url{https://gitlab.com/fars-a/budget-distilled-spectral-ssm}} \end{abstract}

Resources