Ai’s ‘learning Rule’ Explained As Physics, Not Just Maths

Scientists have long considered backpropagation, the cornerstone of neural network training, a symbolic application of the chain rule, but new research reveals its deeper connection to physical systems. Antonino Emanuele Scurria from the Universit e libre de Bruxelles, along with colleagues, demonstrate that backpropagation emerges as the finite-time relaxation of a physical dynamical system, termed ‘Dyadic Backpropagation’. Their work formulates feedforward inference as a continuous-time process, deriving a global energy functional that performs both inference and credit assignment simultaneously. Significantly, the team proves standard backpropagation can be recovered exactly in a predictable number of steps, unlike previous energy-based methods which relied on approximations or specific network conditions. This establishes backpropagation not merely as an algorithm, but as the digital manifestation of a continuous physical relaxation, offering a rigorous foundation for exact gradients in both digital and analogue computing substrates.

By applying a modified version of the Lagrangian theory of non-conservative systems to handle asymmetric weights and interactions of feedforward networks, they construct a global energy functional on a doubled state space encoding both activations and sensitivities.

The saddle-point dynamics of this energy perform inference and credit assignment simultaneously through local interactions. This framework is termed “Dyadic Backpropagation”. Crucially, they prove that unit-step Euler discretization, the natural timescale of layer transitions, recovers standard backpropagation exactly in 2L steps for an L-layer network, with no approximations.
Unlike prior energy-based methods requiring symmetric weights, asymptotic convergence, or vanishing perturbations, this framework guarantees exact gradients in finite time. This establishes backpropagation as the digitally optimized shadow of a continuous physical relaxation, providing a rigorous foundation for exact gradient computation in analog and neuromorphic substrates where continuous dynamics are native.

The research objectives are to unify physical and algebraic steps of backpropagation, achieve exactness in finite time, and ensure universality and robustness across architectures. The approach involves formulating feedforward inference as a continuous-time process and applying Lagrangian theory of non-conservative systems.

A global energy functional is derived on a doubled state space, and the dynamics of this energy are analysed. Specific contributions include demonstrating that backpropagation emerges directly from Classical Mechanics, proving exact gradient recovery in 2L steps, and establishing a framework applicable to any differentiable computation graph, including CNNs and arbitrary feedforward topologies.

The framework, Dyadic Backpropagation (DBP), is a variation of dyadic dynamics modified for feedforward systems. The researchers consider a standard L-layer feedforward neural network with input x0 ∈ Rn0 and output aL ∈ RnL, defining pre-activation and activation maps as zl= Wlal−1 + bl and al= σl(zl), where Wl∈Rnl×nl−1, bl∈Rnl, and σlis an elementwise nonlinearity.

All parameters are collected into θ = {Wl, bl}Ll=1. A supervised loss function C(aL, y) is used, and gradients are computed via backpropagation using error vectors δl= ∂C ∂zl, propagated recursively as δl= (W ⊤ l+1δl+1) ⊙σ′ l(zl), with δL = ∇aLC ⊙σ′ L(zL). Gradients are then calculated as ∂C ∂Wl = δla⊤ l−1 and ∂C ∂bl = δl.

The layer-wise discrete dynamics are transitioned to a global description by interpreting the layer index as a discrete time coordinate and embedding it into a continuous-time vector field. A global activation vector is stacked: a = a1 a2 . aL ∈ Rn, n = L X l=1 nl. The global weight matrix is defined as a strictly lower block-triangular matrix W, ensuring information flows only from layer l−1 to layer l.

Bias and input drives are collected into a global vector β(x0) = W1x0 + b1 b2 . bL. The forward pass is then compactly written as a = σ(W a + β(x0)), where σ denotes the stacked nonlinearity. The feedforward dynamics are generalized to continuous time with the vector field da(t) dt = σ W a(t) + β(x0) −a(t) =: F(a(t)).

The global matrix W is shown to be nilpotent: W L = 0, capturing the acyclic structure of feedforward networks. To construct a global energy, the researchers address the non-reciprocal nature of feedforward interactions by doubling the phase space, introducing a “backward” state x and a “forward” state z.

The global energy is defined as E(x, z) = (x −z)⊤F x+z 2 + C h x+z 2 i L, y, where the first term is a Lagrangian lifting of the vector field F and the second term is the cost, injecting task information. Expanding this gives E(x, z) = (x −z)⊤h σ W x + z 2 + β(x0) −x+z 2 i + C h x+z 2 i L, y. The midpoint x+z 2 is the activation value, and the discrepancy x −z encodes the gradient signal.

The equations of motion are governed by the variational principle, resulting in saddle-point flow: dx dt = ∂E ∂x and dz dt = −∂E ∂z. These coupled dynamics consist of network relaxation, a backward signal, and a cost term, with the equations explicitly defined as dx dt = σ W x + z 2 + β(x0) −x + z 2 + 1 2W ⊤diag σ′ W x + z 2 + β(x0) −I (x −z) + 1 2 0 . ∇aLC [ x+z 2 ]L, y and dz dt = σ W x + z 2 + β(x0) −x + z 2 −1 2W ⊤diag σ′ W x + z 2 + β(x0) −I (x −z) −1 2 0 . ∇aLC [ x+z 2 ]L, y.

Lagrangian dynamics reconstruct backpropagation via a doubled state space, offering a novel perspective on gradient computation

A global energy functional on a doubled state space forms the core of Dyadic Backpropagation, a framework demonstrating that standard backpropagation emerges as the finite-time relaxation of a physical dynamical system. Researchers constructed this energy functional by applying a modified Lagrangian theory of non-conservative systems to handle the asymmetric interactions inherent in feedforward networks.

This approach encodes both activations and sensitivities within the doubled state space, enabling simultaneous inference and credit assignment through local interactions. Crucially, the work proves that unit-step Euler discretization, mirroring the natural timescale of layer transitions, precisely recovers standard backpropagation in exactly 2L steps for an L-layer network, without any approximations.

This was achieved by formulating feedforward inference as a continuous-time process and leveraging Lagrangian theory. Unlike previous energy-based methods, this framework does not require symmetric weights, asymptotic convergence, or vanishing perturbations, guaranteeing exact gradients in finite time.

The study meticulously demonstrates that the algebraic steps of backpropagation directly originate from Classical Mechanics, establishing a rigorous connection between deep learning and fundamental physical laws. Researchers validated this by showing that the framework is applicable to any differentiable computation graph, including convolutional neural networks and arbitrary feedforward topologies.

This establishes backpropagation as a digitally optimised shadow of a continuous physical relaxation, providing a foundation for exact gradient computation in both analog and neuromorphic substrates where continuous dynamics are native. The methodology’s innovation lies in its ability to derive exact gradients in finite time, circumventing limitations of prior approaches and offering a physically plausible foundation for learning algorithms.

High accuracy and robustness of discrete-time gradient approximation via saddle-point dynamics are demonstrated

Achieving test accuracy parity with standard backpropagation, the research demonstrates a peak of roughly 93% using Algorithm 1. This indicates successful approximation of the learning signal of the ideal gradient via discrete-time saddle-point dynamics. The choice of the discretization step size η does not negatively impact the learning trajectory, confirming robustness across a wide range of values.

Experiments confirm the method achieves robust stability, with precision analysis revealing high accuracy in recovering true gradient parameters. The global relative error, detailed in Appendix E.2, is driven primarily by a mismatch in Euclidean norms, with directional misalignment remaining negligible and barely distinguishable from machine precision.

This confirms the theoretical predictions regarding the fidelity of the proposed approach. Convergence speed analysis shows a direct relationship between step size and relaxation efficiency. As the step size η approaches 1, the number of required relaxation steps collapses toward the theoretical lower bound of 2L = 18, where L represents the network depth.

Layer-wise analysis reveals high-fidelity alignment across all layers, with log-misalignment values generally hovering between −6 and −8. These values are bounded by the IEEE 754 Float32 machine precision limit of approximately 10−7, demonstrating that the physical relaxation is practically indistinguishable from the symbolic algorithm. The research validates the central thesis that the relaxation process is not an approximation, but rather the faithful continuous-time generator of the discrete backpropagation update, reliably recovering exact gradients.

Dyadic Backpropagation unifies neural inference and learning via physical system dynamics, offering a novel perspective on gradient descent

Researchers have demonstrated that backpropagation, the algorithm used to train neural networks, emerges as the finite-time relaxation of a physical dynamical system. By framing feedforward inference as a continuous-time process and utilising Lagrangian theory, they derived a global energy functional on a doubled state space, encompassing both activations and sensitivities.

This framework, termed “Dyadic Backpropagation”, performs inference and credit assignment concurrently through local interactions. Crucially, the authors proved that unit-step Euler discretization, the natural timescale of layer transitions, precisely recovers standard backpropagation in 2L steps for an L-layer network, without requiring approximations.

This establishes backpropagation as the digitally optimised shadow of a continuous physical relaxation, offering a rigorous foundation for exact gradients in both analog and digital substrates. Unlike previous energy-based methods, this framework supports asymmetric weights and guarantees exact gradients in finite time.

The findings suggest a theoretical path to overcome the energy and synchronisation costs associated with scaling deep learning models on conventional von Neumann architectures. By demonstrating that gradient computation does not inherently require global clocks or symbolic differentiation, the research implies that learning can be reformulated as a purely physical process involving the resolution of “stress” between forward and backward states.

The authors acknowledge that their derivation shares similarities with multi-compartment cortical neuron models, potentially bridging artificial and biological learning. Future research could focus on developing new analog and neuromorphic hardware where learning is driven by local physical processes, rather than centralised arithmetic. This work provides a theoretical foundation for such systems, where hardware embodies the algorithm itself, rather than simply executing it.

👉 More information
🗞 Backpropagation as Physical Relaxation: Exact Gradients in Finite Time
🧠 ArXiv: https://arxiv.org/abs/2602.02281

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Control Methods Gain Stability Against Hardware Errors with New Optimisation Technique

Mathematical Analysis Confirms a Long-Standing Conjecture About Special Function Values

February 14, 2026
Quantum Architecture Shrinks Computing Needs to under 100 000 Qubits

Machine Learning Now Personalises Treatment Effects from Complex, Continuous Data

February 14, 2026
Researchers Develop Systems Equating 2 Diagram Classes with Verified Term Rewriting Rules

Researchers Develop Systems Equating 2 Diagram Classes with Verified Term Rewriting Rules

February 14, 2026