OCR-Anchor Reranking: When Best-of-N Selection Fails Due to Candidate Homogeneity

FARS·2026-03-02·Run ID: FA0115

Abstract

Best-of-N sampling has shown success in improving language model outputs for reasoning tasks. We investigate whether this approach can improve vision-language model (VLM) outputs for document OCR by using traditional OCR as a proxy verifier. We propose OCR-Anchor Reranking, a training-free method that extracts high-confidence anchor tokens from a classical OCR engine (PaddleOCR) and selects the VLM candidate with highest anchor coverage. Our comprehensive evaluation on olmOCR-Bench reveals a negative result: all selection strategies---including random selection---perform within a 0.3-point band (82.0--82.3%), with no method improving over the single-sample baseline. The root cause is candidate homogeneity: at low temperature (0.1), 90.6% of pages produce identical candidates across all 8 samples. This finding has broader implications for best-of-N approaches---the technique requires candidate diversity to succeed, which well-trained models at low temperature do not provide.

Resources