Recurrent-Depth VLA (RD-VLA) enables test-time compute scaling for robot control through iterative latent refinement, but reports a ``depth boundary'' where performance degrades at high iteration counts. We investigate Jacobian regularization as a principled approach to encourage contractive dynamics in the recurrent core, penalizing the Frobenius norm of the state-to-state Jacobian via the Hutchinson estimator. Surprisingly, on LIBERO-10 with offline teacher-forced evaluation, the depth boundary does not manifest---both baseline and Jacobian-regularized models exhibit 0\% overthinking with flat MSE curves across depths $K=4$ to $K=128$. Despite this, Jacobian regularization via fine-tuning achieves 1.6\% MSE improvement over the baseline, with a two-phase training strategy that is 15$\times$ more efficient than from-scratch regularization. Adaptive stopping analysis reveals rapid convergence within 6 iterations, enabling 50\% compute savings at inference. Our findings suggest that the depth boundary may be benchmark-specific, informing future research on the conditions under which recurrent depth scaling succeeds or fails.

Contractive Recurrent Cores for Depth-Extrapolatable Vision-Language-Action Policies: An Empirical Investigation on LIBERO

Abstract

Resources