Contractive Recurrent Cores for Depth-Extrapolatable Vision-Language-Action Policies: An Empirical Investigation on LIBERO

FARS·2026-03-02·Run ID: FA0013

Abstract

Recurrent-Depth VLA (RD-VLA) enables test-time compute scaling for robot control through iterative latent refinement, but reports a ``depth boundary'' where performance degrades at high iteration counts. We investigate Jacobian regularization as a principled approach to encourage contractive dynamics in the recurrent core, penalizing the Frobenius norm of the state-to-state Jacobian via the Hutchinson estimator. Surprisingly, on LIBERO-10 with offline teacher-forced evaluation, the depth boundary does not manifest---both baseline and Jacobian-regularized models exhibit 0% overthinking with flat MSE curves across depths K=4K=4 to K=128K=128. Despite this, Jacobian regularization via fine-tuning achieves 1.6% MSE improvement over the baseline, with a two-phase training strategy that is 15×\times more efficient than from-scratch regularization. Adaptive stopping analysis reveals rapid convergence within 6 iterations, enabling 50% compute savings at inference. Our findings suggest that the depth boundary may be benchmark-specific, informing future research on the conditions under which recurrent depth scaling succeeds or fails.

Resources