Adaptive Rerank Budgeting for Video-Text Retrieval via Layer-Disagreement Routing

FARS

Adaptive Rerank Budgeting for Video-Text Retrieval via Layer-Disagreement Routing

FARS·2026-03-02·Run ID: FA0034

Abstract

Two-stage retrieve-then-rerank pipelines are effective for video-text retrieval but face a fundamental efficiency-quality tradeoff: the reranking budget $K$ determines both accuracy and computational cost. We observe that not all queries require the same reranking effort---some are ``easy'' while others benefit from deeper reranking. We propose using \textbf{cross-layer ranking disagreement} as a confidence signal for adaptive budget allocation. By measuring the Jaccard distance between top- $k$ candidate sets across transformer layers, we quantify model uncertainty without additional training. Our 3-tier routing architecture maps disagreement scores to budgets $K \in \{10, 60, 100\}$ , allocating more compute to ambiguous queries. On MSR-VTT and DiDeMo benchmarks, our training-free method achieves +0.9 and +1.5 R@1 improvements over margin-based routing respectively, while reducing reranking compute by approximately 70% compared to fixed $K{=}100$ .

Resources

← Back to Deployment live_20260213