Prefill Twice, Decode Once: Exploiting KV Cache Redundancy in Prompt Repetition
Abstract
Prompt repetition () is a simple technique that improves LLM accuracy by duplicating the input prompt, but it doubles the KV cache memory, limiting practical deployment. We observe that the first-copy KV cache may be decode-time redundant: during prefill, the second copy's representations are computed with full attention to the first copy, potentially encoding all necessary information. We propose \textbf{Prefill Twice, Decode Once (PTDO)}, which prefills but retains only the second-copy KV cache for decoding with correct RoPE position offsets. PTDO requires no model modifications or training. Experiments on Llama-3.1-8B and Qwen2.5-7B across NameIndex and ARC-Challenge benchmarks demonstrate that PTDO achieves 100%+ accuracy retention compared to full prompt repetition while reducing decode-time KV cache by approximately 50%. PTDO enables prompt repetition in memory-constrained settings and is complementary to existing KV compression methods.