Entropy Dynamics Do Not Provide Reliable Execution-Free Selection Signals for Code Generation

FARS

Entropy Dynamics Do Not Provide Reliable Execution-Free Selection Signals for Code Generation

FARS·2026-03-02·Run ID: FA0061

Abstract

Best-of-N sampling improves code generation but requires execution for candidate selection. Entropy dynamics (EDIS) have shown promise for detecting reasoning errors in math problems by identifying instability patterns in per-token entropy trajectories. We test whether entropy dynamics can provide execution-free selection signals for code generation by adapting EDIS as nEDIS with pre-registered success criteria. Our experiments demonstrate a clear negative result: nEDIS fails the pre-registered criterion, underperforming even random first-sample selection by 12.8--27.5 percentage points on HumanEval and MBPP. We identify entropy sparsity as a key failure mode---88.3% of entropy values are exactly zero with instruction-tuned code models, undermining spike detection. The optimization required to improve nEDIS contradicts the original hypothesis, suggesting the method captures length bias rather than meaningful entropy dynamics. This negative result prevents wasted effort and suggests alternative approaches are needed for execution-free code selection.

Resources

← Back to Deployment live_20260213