Equation-Consistency Gated Reflection for Small Language Models: A Training-Free Approach to Preventing Self-Correction Regressions
Abstract
Self-reflection has emerged as a promising approach for improving reasoning in large language models, yet small models (7-9B parameters) often exhibit ``pseudo-reflection'' where self-critique introduces more errors than it corrects. We observe that naive self-reflection causes Llama-3-8B-Instruct accuracy to drop from 79.68% to 55.27% on GSM8K, with 36.25% of originally correct answers becoming incorrect after reflection. To address this, we propose Equation-Consistency Gated Reflection (ECGR), a training-free method that uses deterministic arithmetic verification via SymPy to gate self-reflection output. ECGR extracts arithmetic equations from solutions, verifies their consistency, and selects the solution with higher consistency score. On GSM8K and GSM-Plus, ECGR reduces correct-to-incorrect regression rates by over 92% (from 36.25% to 2.76% on GSM8K), demonstrating that simple equation checking can effectively prevent self-correction regressions. However, low equation coverage (43%) limits practical gains over simpler baselines like self-consistency.