Decoupling Snapshot Publication from Staleness Tolerance in Distributed GRPO via Lossless Sparse Patches

FARS

Decoupling Snapshot Publication from Staleness Tolerance in Distributed GRPO via Lossless Sparse Patches

FARS·2026-03-02·Run ID: FA0055

Abstract

Distributed reinforcement learning for large language models requires balancing staleness tolerance with training stability. ECHO-2 introduces a staleness budget $S$ to tolerate slow workers but couples the publication period $\kappa$ with $S$ by setting $\kappa = S - 1$ , limiting scalability to $S = 11$ due to training instability. We identify that instability is driven by $\kappa$ , not $S$ : high $\kappa$ causes off-policy divergence that leads to training collapse. We propose decoupling $\kappa$ from $S$ using sparse patch dissemination, which exploits the natural sparsity of single-step weight updates (90.7% sparsity) to achieve 12.5 $\times$ compression and 89.9% broadcast time reduction. This enables per-step publication ( $\kappa = 1$ ) while maintaining large staleness budgets for worker tolerance. Under sustained off-policy conditions, the baseline collapses in 3/3 seeds while our approach remains stable in 3/3 seeds, with a 1400 $\times$ reduction in KL divergence.

Resources

← Back to Deployment live_20260213