Decoupling Snapshot Publication from Staleness Tolerance in Distributed GRPO via Lossless Sparse Patches
Abstract
Distributed reinforcement learning for large language models requires balancing staleness tolerance with training stability. ECHO-2 introduces a staleness budget to tolerate slow workers but couples the publication period with by setting , limiting scalability to due to training instability. We identify that instability is driven by , not : high causes off-policy divergence that leads to training collapse. We propose decoupling from using sparse patch dissemination, which exploits the natural sparsity of single-step weight updates (90.7% sparsity) to achieve 12.5 compression and 89.9% broadcast time reduction. This enables per-step publication () while maintaining large staleness budgets for worker tolerance. Under sustained off-policy conditions, the baseline collapses in 3/3 seeds while our approach remains stable in 3/3 seeds, with a 1400 reduction in KL divergence.