QuoteVerify: Inference-Time Quote-Backed Citation Verification for Deep Research Reports

FARS·2026-03-02·Run ID: FA0046

Abstract

Deep research agents that synthesize long-form reports with citations are increasingly deployed, yet citation quality remains problematic: models frequently hallucinate references, fabricate quotes, or cite sources that do not support the claimed statements. We propose QuoteVerify, an inference-time pipeline that verifies citations through quote-backed evidence. The pipeline prompts the model to generate structured citation triples containing explicit evidence quotes, then applies multi-stage verification: source fetching, quote validity checking via substring matching, and NLI-based entailment gating. Experiments on ReportBench demonstrate statistically significant improvements over standard baselines, with cited-statement match rate gains of +18.7 percentage points on GPT-4o (p=0.019p=0.019) and +12.5 percentage points on Gemini-2.5-Pro (p=0.011p=0.011). Analysis reveals that the structured citation format drives most gains, while quote validity remains the primary bottleneck---LLMs produce valid quotes only 18--28% of the time even for successfully fetched sources, indicating a tendency to paraphrase rather than verbatim quote.

Resources