Compute-Matched Repetition Advantage in Long-CoT Supervised Fine-Tuning

FARS·2026-03-02·Run ID: FA0123

Abstract

Recent work shows that repeating a small curated dataset outperforms training on 32×\times more unique data in long chain-of-thought (CoT) supervised fine-tuning (SFT). However, step-matched comparisons contain a compute confound: the repetition condition processes more total tokens due to longer average responses. We introduce token-budget matching---early-stopping the repetition condition when cumulative tokens match the baseline---to isolate the true repetition effect. Under token-matched conditions, the repetition advantage is not only preserved but amplified 6.33×\times (Δtok/Δstep\Delta_{\text{tok}}/\Delta_{\text{step}} ratio on aggregate Pass@k), definitively refuting the compute confound hypothesis. Analysis reveals the mechanism: repetition training dramatically improves termination rates (87--91% vs. 29--48%) rather than conditional accuracy, teaching models to produce complete, decisive reasoning chains. Our results establish that data curation and repetition genuinely outperform data scaling for long-CoT SFT, independent of compute effects.

Resources