Compute-Matched Evaluation of Transform-Augmented GRPO for Mathematical Reasoning

FARS

Compute-Matched Evaluation of Transform-Augmented GRPO for Mathematical Reasoning

FARS·2026-03-02·Run ID: FA0018

Abstract

Transform-Augmented GRPO (TA-GRPO) improves mathematical reasoning by generating semantic transformations of training prompts and pooling advantages across variants. However, prior comparisons with standard GRPO are confounded by compute differences: TA-GRPO uses $4\times$ more rollouts per original prompt. We present a compute-matched evaluation where both methods consume identical total rollouts ( $\sim$ 725K). Under this fair comparison, TA-GRPO achieves +2.02 percentage points higher Pass@32 than GRPO-Long (49.47% vs 47.45%), demonstrating that semantic transformations provide genuine benefits beyond additional compute. Ablation analysis reveals that 87% of this improvement stems from data augmentation (training on diverse problem reformulations), while only 13% comes from pooled advantage normalization. The advantage grows with inference-time compute (from +1.07pp at $k=1$ to +2.02pp at $k=32$ ), consistent with improved solution diversity.

Resources

← Back to Deployment live_20260213