Decoupling Snapshot Publication from Staleness Tolerance in Distributed GRPO via Lossless Sparse Patches

FARS·2026-03-02·Run ID: FA0055

Abstract

Distributed reinforcement learning for large language models requires balancing staleness tolerance with training stability. ECHO-2 introduces a staleness budget SS to tolerate slow workers but couples the publication period κ\kappa with SS by setting κ=S1\kappa = S - 1, limiting scalability to S=11S = 11 due to training instability. We identify that instability is driven by κ\kappa, not SS: high κ\kappa causes off-policy divergence that leads to training collapse. We propose decoupling κ\kappa from SS using sparse patch dissemination, which exploits the natural sparsity of single-step weight updates (90.7% sparsity) to achieve 12.5×\times compression and 89.9% broadcast time reduction. This enables per-step publication (κ=1\kappa = 1) while maintaining large staleness budgets for worker tolerance. Under sustained off-policy conditions, the baseline collapses in 3/3 seeds while our approach remains stable in 3/3 seeds, with a 1400×\times reduction in KL divergence.

Resources