Label-Free Hyperparameter Calibration for Parallel Context Encoding via KL Divergence Matching
Abstract
Adaptive Parallel Encoding (APE) enables efficient long-context processing by encoding document chunks independently, but its performance depends critically on hyperparameters---attention temperature and scaling factor---that typically require labeled validation data to tune. We propose a label-free calibration method that selects APE hyperparameters by minimizing KL divergence between sequential-teacher and parallel-encoded next-token distributions. Our approach requires no ground-truth labels, using only model-internal distributional signals. On LongBench 2WikiMultihopQA, KL-tuned APE achieves F1=48.09, outperforming label-tuned oracle (F1=46.23) by +1.86 points. We find that temperature dominates the KL landscape with 2.6 higher sensitivity than scale, and temperature selection is perfectly stable under bootstrap resampling while scale selection is not. The weak KL-F1 correlation (=0.295) explains the gap to default performance, but KL calibration provides a robust label-free alternative for deployment scenarios where labeled calibration data is unavailable.