Continual Learning Achieves Reduced Forgetting Loss Via Joint Weight Optimisation

Scientists are tackling the persistent problem of ‘catastrophic forgetting’ in artificial intelligence, where machine learning models struggle to retain previously learned information when adapting to new data. Allyson Hahn and Krishnan Raghavan, both from Argonne National Laboratory, alongside et al., present a novel mathematical framework investigating how a network’s architecture impacts continual learning performance. Their research demonstrates that simply adjusting model weights isn’t enough to prevent forgetting when data distributions change, and crucially, proves that simultaneously optimising both architectural and weight parameters significantly reduces this issue. By formulating continual learning as a complex bilevel optimisation problem and developing a low-rank transfer mechanism, this work achieves up to two orders of magnitude improvement in performance and robustness across diverse network types , representing a substantial step towards creating truly adaptable and intelligent systems.

Sobolev Spaces and Dynamic Neural Architectures

Scientists have demonstrated a groundbreaking approach to Continual learning, overcoming limitations of static models by dynamically adapting both architecture and weights. The research introduces a novel mathematical framework, modelling both network architecture and weights within a Sobolev space, allowing for rigorous analysis of their interplay and impact on forgetting during continual learning. This innovative work proves that solely adjusting model weights is insufficient to mitigate catastrophic forgetting when data distributions shift, establishing the necessity of simultaneously learning optimal architectures alongside weights. Consequently, the team formulated continual learning as a bilevel optimization problem, where an upper level identifies the optimal architecture for each task, while a lower level computes optimal weights through dynamic programming across all tasks.
To solve this complex problem, researchers developed a derivative-free direct search algorithm to pinpoint the optimal architecture, a significant advancement over existing methods. A key challenge addressed was the transfer of knowledge from the current architecture to the newly determined optimal one, which often involves mismatched parameter space dimensions. To bridge this dimensionality gap, the study unveils a low-rank transfer mechanism, enabling effective knowledge mapping between architectures of differing sizes, a crucial innovation for practical implementation. Empirical evaluations across diverse problems, including regression, classification, feedforward, convolutional, and graph neural networks, showcase the substantial improvements achieved by this method.

Experiments reveal that simultaneously learning optimal architecture and weights yields performance gains of up to two orders of magnitude, significantly reducing forgetting and enhancing robustness to noise compared to traditional static approaches. The study establishes that simply altering model weights is inadequate to capture data distribution drift, and the capacity of a neural network diverges if the data distribution continually changes. However, the research demonstrates that this divergence can be overcome by reliably modifying the architecture of the AI model in response to evolving data needs. This work addresses three key bottlenecks: understanding the coupling between weights and architecture, balancing the forgetting-generalization trade-off, and enabling efficient knowledge transfer between architectures with mismatched dimensions.

Furthermore, the team’s contribution lies in modelling the continual training problem within a Sobolev space, capturing function space dependencies that are missed by traditional weight-space modelling. This theoretical foundation allows for a clear demonstration that architecture changes are essential, not merely beneficial, for effective continual learning. By combining a novel formulation for understanding weight-architecture coupling with a methodical architecture search and a new transfer mechanism, this research opens new avenues for building AI systems that can learn continuously and adapt to changing environments with minimal performance loss.

Weights and architecture optimise continual learning performance

Researchers have developed a novel mathematical framework addressing the challenge of continual learning in artificial intelligence systems. This work jointly models and optimises both network weights and network architecture, represented as a ‘Sobolev space’, to rigorously examine their roles in mitigating catastrophic forgetting when data distributions shift across tasks. The findings demonstrate that simply adjusting model weights is insufficient to prevent forgetting; instead, simultaneously learning optimal network architectures alongside the weights is crucial for robust continual learning. This research establishes a bilevel optimisation problem where an upper level determines the best network architecture for a given task, while a lower level computes optimal weights using dynamic programming across all tasks.

To overcome the dimensionality mismatches that arise when transferring knowledge between architectures, the team devised a low-rank transfer mechanism, effectively mapping knowledge across differing parameter spaces. Empirical evaluations across diverse network types, including feedforward, convolutional, and graph networks, show that this combined approach significantly improves performance, reduces forgetting, and enhances robustness to noise, achieving up to two orders of magnitude improvement in some cases. The authors acknowledge that the architecture search component introduces inherent instability, reflected in the larger error bounds observed in their experiments. However, they highlight that this limitation is common to all architecture search methods and could be addressed with alternative search algorithms. Future research directions include exploring the application of this framework to more complex, real-world datasets and investigating the potential for integrating it with existing continual learning techniques. These findings offer a promising pathway towards developing AI systems capable of adapting to evolving environments without losing previously acquired knowledge, representing a substantial advance in the field of continual learning.

👉 More information
🗞 The Effect of Architecture During Continual Learning
🧠 ArXiv: https://arxiv.org/abs/2601.19766

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Sonic Achieves Global Context with Spectral Convolutions, Overcoming CNN Limitations

Sonic Achieves Global Context with Spectral Convolutions, Overcoming CNN Limitations

January 29, 2026
Keel Advances Deep Language Models, Stabilising Post-Layernorm for Extreme Depths

Keel Advances Deep Language Models, Stabilising Post-Layernorm for Extreme Depths

January 29, 2026
Altermagnetism Achieves Tunable Two-Dimensional Dirac-Weyl Semimetal Phase Manipulation

Altermagnetism Achieves Tunable Two-Dimensional Dirac-Weyl Semimetal Phase Manipulation

January 29, 2026