Disagreement-Gated Judge KV Reuse: A Training-Free Safety Signal for Multi-Agent LLM Systems

FARS·2026-03-02·Run ID: FA0145

Abstract

Multi-agent LLM systems increasingly rely on LLM judges to select winners among candidate solutions. KV cache reuse can accelerate these judges but introduces position bias that degrades consistency---existing methods achieve only 61--66% Judge Consistency Rate (JCR) compared to dense prefill. We propose Disagreement-Gated Judge KV Reuse (DG-JKR), a training-free method that uses disagreement between two structurally different KV reuse methods (Naive Reuse and KVCOMM) as a safety signal. When both methods agree on a winner (83% of cases), DG-JKR accepts the result; when they disagree, it falls back to dense prefill. On HumanEval with Llama-3.2-3B-Instruct, DG-JKR achieves 74.38% JCR, improving over Naive Reuse by 8.13 percentage points and significantly outperforming random gating by 5.63 percentage points (p<0.05p < 0.05). The mechanism generalizes across candidate generation regimes and provides stable functional improvements (80.00% ±\pm 0.62% JCR-F).

Resources