SCORE: Recurrent Depth for Deep Networks

SCORE introduces a recurrent, iterative approach to deep neural networks, accelerating training and reducing parameter counts without complex ODE solvers.

Mar 12 at 8:15 PM2 min read
Diagram illustrating the SCORE iterative update process compared to traditional layer stacking.

The efficacy of modern deep neural networks hinges on residual connections that facilitate stable optimization and information flow. However, the conventional approach of stacking layers introduces significant computational overhead. This paper introduces SCORE (Skip-Connection ODE Recurrent Embedding), a novel discrete recurrent alternative that reframes network depth as an iterative refinement process. This innovation, detailed on arXiv, proposes a single shared neural block applied iteratively, inspired by Ordinary Differential Equations (ODEs) but crucially avoiding complex solvers.

Iterative Refinement Over Stacking

SCORE replaces the sequential composition of independent layers with a recurrent application of a single, shared neural block. The core update rule, ht+1 = (1 - dt) * ht + dt * F(ht), leverages a step size dt to control the magnitude and stability of updates. This formulation effectively treats network depth as a discrete iterative process. Unlike continuous Neural ODEs, SCORE employs standard backpropagation with a fixed number of iterations, simplifying implementation and eliminating the need for ODE solvers or adjoint methods. This iterative depth strategy is a significant departure from traditional architectural designs.

Accelerated Training and Efficiency Gains

Across evaluations on graph neural networks (ESOL molecular solubility), multilayer perceptrons, and Transformer-based language models (nanoGPT), SCORE demonstrates compelling performance improvements. The research indicates that SCORE generally enhances convergence speed and accelerates training. A key advantage lies in its parameter efficiency; by sharing weights across iterations, SCORE effectively reduces the overall parameter count compared to equivalent stacked architectures. The study also found that simple Euler integration offers the optimal balance between computational cost and performance, with higher-order integrators providing only marginal gains at a substantial computational increase. These findings position SCORE deep neural networks as a promising direction for building more efficient and faster-training deep learning models.