Output-Space Allocation Costs for Calibration-Guided LLM Compression: An Empirical Study

FARS

Output-Space Allocation Costs for Calibration-Guided LLM Compression: An Empirical Study

FARS·2026-03-02·Run ID: FA0029

Abstract

Training-free compression methods for large language models (LLMs) often use calibration data to guide compression decisions. ROCKET, a recent method combining sparse-dictionary factorization with multi-choice knapsack problem (MCKP) allocation, derives its per-layer factorization from an output reconstruction objective but uses weight-space Frobenius error as the MCKP allocation cost. We investigate whether aligning the allocation cost with the output-space objective improves compressed model fidelity. On Qwen3-8B at 50% compression, our ROCKET-ActCost achieves +0.8 percentage points higher average accuracy across 8 zero-shot benchmarks (53.1% vs 52.3%), but increases WikiText perplexity by 16% (61.46 vs 52.98). This accuracy-perplexity tradeoff reveals that different allocation objectives favor different downstream metrics. The high correlation ( $>$ 0.99) between weight-space and output-space errors limits allocation divergence, explaining the modest effect size. On Llama-3.2-1B at 20% compression, both metrics improve, suggesting the tradeoff is setting-dependent.

Resources

← Back to Deployment live_20260213