Researchers are increasingly focused on understanding why deep learning training can be so fragile, despite impressive results. Zhipeng Zhang (China Mobile Research Institute & China Mobile GBA Innovation Institute), Zhenjie Yao (Institute of Microelectronics, Chinese Academy of Sciences), and Kai Li, alongside Lei Yang, demonstrate that training instability isn’t random, but follows predictable, low-dimensional dynamical principles. Their work reveals that factors like optimisation, data, parameters and signal stability interact during training, and that performance isn’t always a reliable indicator of a stable process. By carefully auditing training trajectories, the team identified consistent patterns , including the protective effect of controlled stochasticity and early warning signs of collapse in latent states , offering a new, measurable way to assess and improve the robustness of deep learning systems.
Training policy entropy or gradient coherence) that modulate proximity to instability. This work develops a scientific framework for studying training stability as a property. In the era of foundation models, this reframing has implications for responsible scaling and safety: late-stage failures are not merely engineering accidents, but constraints on which scaling regimes can be scientifically explored, reproduced, and governed. Existing scaling laws describe how capability improves with model size, data, and compute; however, our findings highlight a critical blind spot, capability scaling does not imply dynamical reliability, i. e. whether those capabilities are stably attainable under inevitable perturbations.
Researchers introduce perturbation-based auditing as a methodological approach to studying training stability. Rather than relying on anecdotal failure analysis, perturbation auditing systematically probes the dynamical responses of learning systems, providing principled and reproducible insight into instability formation. They propose StabilityBench not as a benchmark, but as a scientific instrument enabling controlled perturbation auditing across learning paradigms, including reinforcement learning and large language models .Through such audits, they uncover cross-domain regularities in how instability develops, often before performance degradation becomes visible.
They further propose that meta-state representations serve as low-dimensional structural summaries of training dynamics. Meta-states aggregate multiple telemetry channels, such as performance metrics, gradient statistics, and optimizer states, into a joint representation capturing how learning dynamics evolve as instability forms. Importantly, the meta-state is not an average, but a representation of how multiple channels co-vary as training approaches a structural transition. This aggregation enables conditional closed-loop interaction as a monitoring prototype: the meta-state can support selective, non-intrusive interaction with training dynamics in unstable regimes, while remaining quiescent in stable ones.
This interaction is used to probe the responsiveness of the learning dynamics, rather than to assert predictive capability or deploy control strategies. By providing structured, low-dimensional observables, this work lays the groundwork for learning systems that are not only capable, but also scientifically interpretable, diagnosable, and auditable. Why a joint meta-state rather than single metrics? Individual indicators (e. g., performance trends, gradient statistics, or short-term instability indices) provide only partial projections of the underlying dynamics and may not exhibit consistent anomalies even as instability forms.
Across audits, collapse-prone runs are characterized by coordinated multi-channel drift over time, motivating a joint latent representation rather than thresholding any single metric in isolation. This limitation is substantiated in Section 2.3, where. Experiments revealed that high final performance is frequently decoupled from training stability, a finding substantiated across both reinforcement learning and large language model training. The team measured a systematic dissociation between these two factors, demonstrating that models achieving state-of-the-art results can be exceptionally fragile to minor disturbances during training.
Results demonstrate that reinforcement learning algorithms exhibit pronounced differences in training stability under optimization perturbations; on HalfCheetah-v3, a single learning-rate spike at step 2000 induced irreversible training collapse in PPO, while SAC and TD3 maintained stable learning trajectories despite comparable returns prior to the perturbation. Data shows that this instability manifests as an algorithm-dependent failure mode, rather than gradual performance degradation or noise accumulation, with consistent patterns observed under action noise and reward-scale perturbations. The breakthrough delivers a method for characterizing stability as a dynamical property, moving beyond reliance on final performance outcomes. Further analysis of large language models confirmed similar stability, performance dissociations, with documented incidents of loss spikes and interruptions during GPT-3 training due to learning-rate schedule issues, irrecoverable divergence in PaLM training from gradient numerical anomalies, and sharp loss surges in LLaMA training.
Measurements confirm that these events underscore the fragility of models attaining peak performance during training, a phenomenon systematically mapped onto specific dimensions of an instability taxonomy. Tests prove that controlled, dimension-specific perturbations, rather than post hoc failure analysis, establish stability as a property independent of final performance. Scientists recorded that training instability consistently manifests as an abrupt, non-smooth event, often triggered by a single localized perturbation, indicating it isn’t driven by long-term stochastic accumulation but by a structural transition in training dynamics. The team analyzed training instability from a dynamical systems perspective, identifying shared instability manifolds across learning paradigms, and focusing on gradient directional coherence, denoted as xgrad, to characterize the geometric structure of gradient updates.
👉 More information
🗞 Training instability in deep learning follows low-dimensional dynamical principles
🧠 ArXiv: https://arxiv.org/abs/2601.13160
