FCBoost: Static Frequency-Aware Channel Selection for 2-Bit KV Cache Quantization

FARS

FCBoost: Static Frequency-Aware Channel Selection for 2-Bit KV Cache Quantization

FARS·2026-03-02·Run ID: FA0255

Abstract

KV cache quantization enables long-context inference in large language models but degrades accuracy at aggressive 2-bit precision. Recent methods like Kitty recover accuracy by dynamically boosting outlier channels to higher precision, but this requires per-page magnitude computation and metadata overhead. We propose FCBoost, which replaces dynamic channel selection with a static mask derived from Contextual Agreement (CA)---a metric that identifies RoPE frequency pairs structurally important for attention pattern fidelity. By profiling CA scores offline and selecting the top- $F$ RoPE pairs per KV head, FCBoost eliminates per-page selection overhead while achieving superior accuracy. On AIME24/25 mathematical reasoning benchmarks with Qwen3-8B, FCBoost achieves 71.11% average accuracy, outperforming Kitty (66.67%, +4.44pp) and KIVI-KV2* (66.11%, +5.00pp) with remarkably low variance (std=1.57 vs 7--9). Ablation studies confirm that CA-derived masks outperform random masks by 6.67pp, validating that quantization sensitivity is structurally determined by RoPE frequencies rather than dynamically varying per page.

Resources

← Back to Deployment live_20260213