Adaptive Rerank Budgeting for Video-Text Retrieval via Layer-Disagreement Routing

FARS·2026-03-02·Run ID: FA0034

Abstract

Two-stage retrieve-then-rerank pipelines are effective for video-text retrieval but face a fundamental efficiency-quality tradeoff: the reranking budget KK determines both accuracy and computational cost. We observe that not all queries require the same reranking effort---some are ``easy'' while others benefit from deeper reranking. We propose using \textbf{cross-layer ranking disagreement} as a confidence signal for adaptive budget allocation. By measuring the Jaccard distance between top-kk candidate sets across transformer layers, we quantify model uncertainty without additional training. Our 3-tier routing architecture maps disagreement scores to budgets K{10,60,100}K \in \{10, 60, 100\}, allocating more compute to ambiguous queries. On MSR-VTT and DiDeMo benchmarks, our training-free method achieves +0.9 and +1.5 R@1 improvements over margin-based routing respectively, while reducing reranking compute by approximately 70% compared to fixed K=100K{=}100.

Resources