Position Bias Correction is Insufficient for One-Pass Attention Sorting

FARS·2026-03-02·Run ID: FA0328

Abstract

Long-context language models suffer from position bias, where information in middle positions is under-utilized. Attention Sorting addresses this by iteratively reordering documents based on attention patterns, but requires multiple expensive prefill passes. We hypothesize that position bias is the primary bottleneck and propose Debiased One-Pass Attention Sorting, which estimates a per-prompt position-bias curve from distractor documents and subtracts it from raw attention scores to enable single-pass sorting. Our experiments on two models refute this hypothesis: on LLaMA-2-7B-32K-Instruct, debiasing produces identical results to uncalibrated single-pass sorting (94.83% accuracy), while on YaRN-Llama-2-7b-64k, debiasing improves accuracy by 8.67 percentage points but remains 14.84pp behind iterative sorting, closing only 37% of the gap. These results demonstrate that position bias is not the primary bottleneck on well-tuned models and that iterative sorting provides benefits beyond bias correction, likely from attention context refinement across passes.

Resources