Test-time scaling through parallel trace generation improves LLM reasoning but introduces computational redundancy, as many traces converge to identical answers. DeepPrune addresses this through inter-trace clustering with a trained judge model, but requires expensive pairwise comparisons that scale with the number of clusters. We propose Sketch-Gated DeepPrune, which adds a lightweight locality-sensitive hashing (LSH) layer to filter candidate clusters before judge comparisons. Our method computes 64-bit SimHash signatures from trace suffixes and uses LSH banding to identify candidate similar clusters in constant time, enabling an adaptive comparison strategy that reduces the per-cluster comparison budget from 10 to 3 when candidates are available. On AIME and GPQA benchmarks, Sketch-Gated achieves 1.82$\times$ judge prompt reduction (1635$\rightarrow$897) with zero accuracy loss compared to DeepPrune. Surprisingly, ablation studies reveal that the efficiency gains primarily stem from the adaptive comparison mechanism rather than LSH candidate quality, suggesting that simpler gating strategies may achieve similar benefits.

Sketch-Gated Trace Clustering for Accelerating Inter-Trace Redundancy Pruning

Abstract

Resources