Range-Capped Sinkhorn for Reliable Manifold-Constrained Hyper-Connections
Abstract
Manifold-constrained Hyper-Connections (mHC) use Sinkhorn projection to produce doubly-stochastic routing matrices in deep networks, ensuring information preservation during multi-stream routing. However, we discover that mHC's routing parameters receive exactly zero gradients with default settings due to numerical underflow: the Sinkhorn input range of 160 causes to underflow to zero, producing exact permutation matrices that block gradient flow. We propose Range-Capped Sinkhorn (RRCS), which caps the input log-range to before Sinkhorn iterations, ensuring . On a 48-layer GPT-2 model, RRCS with restores gradient flow (from 0.0 to ), enables parameter learning (drift of 4.19 vs. 0.0), and produces soft routing (entropy 0.93 vs. 0.0)---all while preserving validation loss. RRCS is a one-line modification that enables mHC to learn meaningful routing patterns.