Adaptive Rerank Budgeting for Video-Text Retrieval via Layer-Disagreement Routing
Abstract
Two-stage retrieve-then-rerank pipelines are effective for video-text retrieval but face a fundamental efficiency-quality tradeoff: the reranking budget determines both accuracy and computational cost. We observe that not all queries require the same reranking effort---some are ``easy'' while others benefit from deeper reranking. We propose using \textbf{cross-layer ranking disagreement} as a confidence signal for adaptive budget allocation. By measuring the Jaccard distance between top- candidate sets across transformer layers, we quantify model uncertainty without additional training. Our 3-tier routing architecture maps disagreement scores to budgets , allocating more compute to ambiguous queries. On MSR-VTT and DiDeMo benchmarks, our training-free method achieves +0.9 and +1.5 R@1 improvements over margin-based routing respectively, while reducing reranking compute by approximately 70% compared to fixed .