Disagreement-Gated Judge KV Reuse: A Training-Free Safety Signal for Multi-Agent LLM Systems
Abstract
Multi-agent LLM systems increasingly rely on LLM judges to select winners among candidate solutions. KV cache reuse can accelerate these judges but introduces position bias that degrades consistency---existing methods achieve only 61--66% Judge Consistency Rate (JCR) compared to dense prefill. We propose Disagreement-Gated Judge KV Reuse (DG-JKR), a training-free method that uses disagreement between two structurally different KV reuse methods (Naive Reuse and KVCOMM) as a safety signal. When both methods agree on a winner (83% of cases), DG-JKR accepts the result; when they disagree, it falls back to dense prefill. On HumanEval with Llama-3.2-3B-Instruct, DG-JKR achieves 74.38% JCR, improving over Naive Reuse by 8.13 percentage points and significantly outperforming random gating by 5.63 percentage points (). The mechanism generalizes across candidate generation regimes and provides stable functional improvements (80.00% 0.62% JCR-F).