Scientists are tackling the persistent problem of ‘catastrophic forgetting’ in artificial intelligence, where machine learning models struggle to retain previously learned information when adapting to new data. Allyson Hahn and Krishnan Raghavan, both from Argonne National Laboratory, alongside et al., present a novel mathematical framework investigating how a network’s architecture impacts continual learning performance. Their research demonstrates that simply adjusting model weights isn’t enough to prevent forgetting when data distributions change, and crucially, proves that simultaneously optimising both architectural and weight parameters significantly reduces this issue. By formulating continual learning as a complex bilevel optimisation problem and developing a low-rank transfer mechanism, this work achieves up to two orders of magnitude improvement in performance and robustness across diverse network types , representing a substantial step towards creating truly adaptable and intelligent systems.
Sobolev Spaces and Dynamic Neural Architectures
Scientists have demonstrated a groundbreaking approach to Continual learning, overcoming limitations of static models by dynamically adapting both architecture and weights. The research introduces a novel mathematical framework, modelling both network architecture and weights within a Sobolev space, allowing for rigorous analysis of their interplay and impact on forgetting during continual learning. This innovative work proves that solely adjusting model weights is insufficient to mitigate catastrophic forgetting when data distributions shift, establishing the necessity of simultaneously learning optimal architectures alongside weights. Consequently, the team formulated continual learning as a bilevel optimization problem, where an upper level identifies the optimal architecture for each task, while a lower level computes optimal weights through dynamic programming across all tasks.
To solve this complex problem, researchers developed a derivative-free direct search algorithm to pinpoint the optimal architecture, a significant advancement over existing methods. A key challenge addressed was the transfer of knowledge from the current architecture to the newly determined optimal one, which often involves mismatched parameter space dimensions. To bridge this dimensionality gap, the study unveils a low-rank transfer mechanism, enabling effective knowledge mapping between architectures of differing sizes, a crucial innovation for practical implementation. Empirical evaluations across diverse problems, including regression, classification, feedforward, convolutional, and graph neural networks, showcase the substantial improvements achieved by this method.
Experiments reveal that simultaneously learning optimal architecture and weights yields performance gains of up to two orders of magnitude, significantly reducing forgetting and enhancing robustness to noise compared to traditional static approaches. The study establishes that simply altering model weights is inadequate to capture data distribution drift, and the capacity of a neural network diverges if the data distribution continually changes. However, the research demonstrates that this divergence can be overcome by reliably modifying the architecture of the AI model in response to evolving data needs. This work addresses three key bottlenecks: understanding the coupling between weights and architecture, balancing the forgetting-generalization trade-off, and enabling efficient knowledge transfer between architectures with mismatched dimensions.
Furthermore, the team’s contribution lies in modelling the continual training problem within a Sobolev space, capturing function space dependencies that are missed by traditional weight-space modelling. This theoretical foundation allows for a clear demonstration that architecture changes are essential, not merely beneficial, for effective continual learning. By combining a novel formulation for understanding weight-architecture coupling with a methodical architecture search and a new transfer mechanism, this research opens new avenues for building AI systems that can learn continuously and adapt to changing environments with minimal performance loss.
Weights and architecture optimise continual learning performance
Researchers have developed a novel mathematical framework addressing the challenge of continual learning in artificial intelligence systems. This work jointly models and optimises both network weights and network architecture, represented as a ‘Sobolev space’, to rigorously examine their roles in mitigating catastrophic forgetting when data distributions shift across tasks. The findings demonstrate that simply adjusting model weights is insufficient to prevent forgetting; instead, simultaneously learning optimal network architectures alongside the weights is crucial for robust continual learning. This research establishes a bilevel optimisation problem where an upper level determines the best network architecture for a given task, while a lower level computes optimal weights using dynamic programming across all tasks.
To overcome the dimensionality mismatches that arise when transferring knowledge between architectures, the team devised a low-rank transfer mechanism, effectively mapping knowledge across differing parameter spaces. Empirical evaluations across diverse network types, including feedforward, convolutional, and graph networks, show that this combined approach significantly improves performance, reduces forgetting, and enhances robustness to noise, achieving up to two orders of magnitude improvement in some cases. The authors acknowledge that the architecture search component introduces inherent instability, reflected in the larger error bounds observed in their experiments. However, they highlight that this limitation is common to all architecture search methods and could be addressed with alternative search algorithms. Future research directions include exploring the application of this framework to more complex, real-world datasets and investigating the potential for integrating it with existing continual learning techniques. These findings offer a promising pathway towards developing AI systems capable of adapting to evolving environments without losing previously acquired knowledge, representing a substantial advance in the field of continual learning.
👉 More information
🗞 The Effect of Architecture During Continual Learning
🧠 ArXiv: https://arxiv.org/abs/2601.19766
