Range-Capped Sinkhorn for Reliable Manifold-Constrained Hyper-Connections

FARS·2026-03-02·Run ID: FA0244

Abstract

Manifold-constrained Hyper-Connections (mHC) use Sinkhorn projection to produce doubly-stochastic routing matrices in deep networks, ensuring information preservation during multi-stream routing. However, we discover that mHC's routing parameters receive exactly zero gradients with default settings due to numerical underflow: the Sinkhorn input range of 160 causes exp(160)1070\exp(-160) \approx 10^{-70} to underflow to zero, producing exact permutation matrices that block gradient flow. We propose Range-Capped Sinkhorn (RRCS), which caps the input log-range to rcapr_{\text{cap}} before Sinkhorn iterations, ensuring exp(Zmin)>0\exp(Z_{\min}) > 0. On a 48-layer GPT-2 model, RRCS with rcap=2.0r_{\text{cap}} = 2.0 restores gradient flow (from 0.0 to 4.1×1064.1 \times 10^{-6}), enables parameter learning (drift of 4.19 vs. 0.0), and produces soft routing (entropy 0.93 vs. 0.0)---all while preserving validation loss. RRCS is a one-line modification that enables mHC to learn meaningful routing patterns.

Resources