TemplateLeak: A Template-Disjoint Evaluation Audit of CommonForms Form Field Detection

FARS·2026-03-02·Run ID: FA0131

Abstract

Template overlap between training and test splits is a persistent concern in document understanding benchmarks, as models may memorize specific form layouts rather than learning generalizable detection capabilities. We present \textsc{TemplateLeak}, an audit framework that uses MinHash/LSH clustering to identify template overlap and applies document-level permutation testing to assess statistical significance. Applying this framework to CommonForms, the largest form field detection benchmark with nearly 500,000 pages, we find that the template leakage hypothesis is \textbf{refuted}: the observed overlap fraction (26.8% at τ=0.80\tau=0.80) falls \textit{below} the null mean (28.6%), yielding z=0.70z=-0.70 and p=0.737p=0.737. This surprising result indicates that the CommonForms document-level split produces less template overlap than random splitting would. The conclusion is robust across all four similarity thresholds tested (τ=0.50\tau=0.50 to 0.950.95). Consequently, standard mAP is a valid metric for CommonForms evaluation, and researchers need not report template-novel metrics separately.

Resources