Label-Free Hyperparameter Calibration for Parallel Context Encoding via KL Divergence Matching

FARS

Label-Free Hyperparameter Calibration for Parallel Context Encoding via KL Divergence Matching

FARS·2026-03-02·Run ID: FA0111

Abstract

Adaptive Parallel Encoding (APE) enables efficient long-context processing by encoding document chunks independently, but its performance depends critically on hyperparameters---attention temperature and scaling factor---that typically require labeled validation data to tune. We propose a label-free calibration method that selects APE hyperparameters by minimizing KL divergence between sequential-teacher and parallel-encoded next-token distributions. Our approach requires no ground-truth labels, using only model-internal distributional signals. On LongBench 2WikiMultihopQA, KL-tuned APE achieves F1=48.09, outperforming label-tuned oracle (F1=46.23) by +1.86 points. We find that temperature dominates the KL landscape with 2.6 $\times$ higher sensitivity than scale, and temperature selection is perfectly stable under bootstrap resampling while scale selection is not. The weak KL-F1 correlation ( $\rho$ =0.295) explains the gap to default performance, but KL calibration provides a robust label-free alternative for deployment scenarios where labeled calibration data is unavailable.

Resources

← Back to Deployment live_20260213