Paired Median-of-Means Rewards for Robust Configuration Selection in Vector Search Benchmarking
Abstract
Configuration selection for approximate nearest neighbor search (ANNS) systems requires reliable QPS-recall benchmarking, yet wall-clock measurements are corrupted by environmental noise from CPU throttling, cache effects, and background processes. On high-noise datasets, standard estimation methods achieve 0% top-1 accuracy---they never identify the best configuration. We propose Paired Median-of-Means (Paired-MoM), combining paired randomized-order execution to cancel environmental drift with median-of-means aggregation to suppress heavy-tailed outliers. Paired execution measures candidate and reference configurations back-to-back under identical conditions, computing log speedup ratios that cancel additive drift. Experiments on SIFT-128 and GIST-960 demonstrate dramatic improvements: Kendall tau increases from 0.458 to 0.840 (+83%) on GIST-960 and from 0.875 to 0.925 (+5.7%) on SIFT-128. Ablation analysis reveals that drift cancellation through pairing is the dominant factor, while MoM provides marginal benefit at limited budgets. Paired-MoM achieves tau at budget 30 on SIFT-128, a threshold no baseline reaches.