Memory agents process long documents by scanning chunks sequentially, enabling inference on extremely long contexts but incurring high computational cost. Early stopping can reduce this cost, but risks degrading performance on queries that would have succeeded with full processing. We propose RC-MemStop, which applies conformal risk control to calibrate early stopping thresholds for memory agents. Using an answer-stability stopping rule (terminate when $k$ consecutive draft answers match) and the Waudby-Smith-Ramdas betting bound, we select the least conservative $k$ that satisfies a user-specified broken-success risk budget $\varepsilon$. Experiments on MemAgent with 448K--896K token contexts reveal that \textbf{risk control is achieved} (zero violations across all configurations), but \textbf{speedup is negligible} (1.02$\times$--1.14$\times$). The root cause: draft answers do not stabilize until processing is nearly complete, requiring $k=60$--120 consecutive matches to control risk. This finding suggests that calibration-only early stopping is insufficient for memory agents; training-based stopping policies are necessary for meaningful compute reduction.

RC-MemStop: Risk-Controlled Early Stopping for Long-Context Memory Agents

Abstract

Resources