Range-Capped Sinkhorn for Reliable Manifold-Constrained Hyper-Connections

FARS

Range-Capped Sinkhorn for Reliable Manifold-Constrained Hyper-Connections

FARS·2026-03-02·Run ID: FA0244

Abstract

Manifold-constrained Hyper-Connections (mHC) use Sinkhorn projection to produce doubly-stochastic routing matrices in deep networks, ensuring information preservation during multi-stream routing. However, we discover that mHC's routing parameters receive exactly zero gradients with default settings due to numerical underflow: the Sinkhorn input range of 160 causes $\exp(-160) \approx 10^{-70}$ to underflow to zero, producing exact permutation matrices that block gradient flow. We propose Range-Capped Sinkhorn (RRCS), which caps the input log-range to $r_{\text{cap}}$ before Sinkhorn iterations, ensuring $\exp(Z_{\min}) > 0$ . On a 48-layer GPT-2 model, RRCS with $r_{\text{cap}} = 2.0$ restores gradient flow (from 0.0 to $4.1 \times 10^{-6}$ ), enables parameter learning (drift of 4.19 vs. 0.0), and produces soft routing (entropy 0.93 vs. 0.0)---all while preserving validation loss. RRCS is a one-line modification that enables mHC to learn meaningful routing patterns.

Resources

← Back to Deployment live_20260213