TreeLoRA efficiently updates large pre-trained models in sequential learning scenarios, mitigating catastrophic forgetting. It constructs layer-wise adapters using hierarchical gradient similarity and bandit-based task exploration, alongside sparse gradient updates. Evaluations on vision transformers and large language models demonstrate effectiveness across diverse tasks and domains.

The challenge of continual learning, adapting artificial intelligence to sequential data streams without losing previously acquired knowledge, remains a significant hurdle in the development of truly versatile machine learning systems. Researchers are increasingly focused on efficient methods to update large pre-trained models, recognising the computational cost associated with retraining extensive parameter sets. A new approach, detailed in the article ‘TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree’, proposes a method utilising low-rank adaptation and a hierarchical structure to optimise learning efficiency. This work is the result of collaboration between Yu-Yang Qian, Yuan-Ze Xu, and Peng Zhao from the School of Electronic Information Engineering, Tongji University, alongside Zhen-Yu Zhang from the Shanghai Artificial Intelligence Laboratory, and Zhi-Hua Zhou, also of Tongji University. Their research introduces TreeLoRA, a system designed to minimise computational demands while maintaining performance across diverse applications, including both vision transformers and large language models.

TreeLoRA presents a novel approach to continual learning (CL), mitigating catastrophic forgetting in large pre-trained models (LPMs) through the construction of layer-wise adapters organised using a hierarchical K-D tree structure. This structure leverages gradient similarity, identifying relationships between tasks and grouping corresponding adapters to reduce redundancy and optimise parameter usage. Catastrophic forgetting, a significant challenge in CL, occurs when sequentially learning new tasks causes a model to lose previously acquired knowledge.

The system addresses computational demands inherent in modern LPMs, which are characterised by ever-increasing parameter counts. To minimise the cost of estimating task similarity, TreeLoRA employs bandit techniques, specifically utilising lower confidence bounds. Bandit algorithms balance exploiting known relationships between tasks with exploring potentially beneficial, yet unknown, connections. This allows for efficient adaptation without exhaustive comparison of all task combinations.

Theoretical analysis underpins the design, providing justification for the approach and establishing performance bounds. This analysis explains the method’s efficacy and provides a foundation for further optimisation. Experimental validation across both vision transformers (ViTs) and large language models (LLMs) demonstrates TreeLoRA’s effectiveness on diverse tasks spanning vision and natural language processing, highlighting its generalizability.

Sparse gradient updates during parameter optimisation further enhance computational efficiency, making TreeLoRA particularly suitable for resource-constrained environments. By focusing updates on only the most relevant parameters, the system reduces the computational burden without sacrificing performance. The combination of hierarchical adapter organisation, bandit-based exploration, and sparse updates positions TreeLoRA as a promising solution for continual learning in complex models and datasets.

Future research will explore TreeLoRA’s potential in more dynamic environments, investigating alternative bandit algorithms and integration with other CL techniques. Extension to more complex data modalities and application in real-world scenarios, such as robotics and autonomous driving, are also planned. The team intends to develop more efficient and scalable implementations, enabling deployment on resource-constrained devices.

👉 More information
🗞 TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree
🧠 DOI: https://doi.org/10.48550/arXiv.2506.10355

Tags:

Bandit algorithms Catastrophic Forgetting Continual Learning gradient similarity k-d tree large pre-trained models low-rank adaptation parameter optimisation. sparse gradients Vision Transformers

Quantum News

Efficient Continual Learning with TreeLoRA for Large Pre-trained Models.

Latest Posts by Quantum News:

Lawrence Livermore National Laboratory Partners to Optimize Manufacturing Processes with High-Performance Computing

IonQ Reports $130 Million in 2025 Revenue, Tripling Prior Year Results

Xanadu Advances Quantum Software Stack Through PennyLane and MQT Integration