Researchers developed MAP, a new framework that enhances parameter-efficient fine-tuning (PEFT) of large models by decoupling weight adaptation into directional and magnitude components. This geometrically grounded approach normalises pre-trained weights and independently scales updates, demonstrably improving performance when integrated with existing PEFT methods.

Efficient adaptation of large language models for specific tasks presents a continuing challenge due to the computational demands of full fine-tuning. Researchers are increasingly focused on parameter-efficient fine-tuning (PEFT) techniques, which modify only a small subset of a model’s parameters. A new framework, termed MAP (Matrix Adaptation via Projection), offers a refined approach to weight decomposition, decoupling adaptation into directional and magnitude components with a mathematically grounded formulation. This work, conducted by Chongjie Si, Zhiyi Shi, Yadao Wang, Xiaokang Yang, Susanto Rahardja, and Wei Shen – representing institutions including Shanghai Jiao Tong University, Harvard University, Alibaba Group, and the Singapore Institute of Technology – is detailed in their article, “MAP: Revisiting Weight Decomposition for Low-Rank Adaptation”. Their findings demonstrate performance improvements when MAP is integrated with existing PEFT methods, suggesting a potentially valuable enhancement for future model adaptation strategies.

Decoupled Weight Adaptation Improves Large Language Model Fine-tuning

Parameter-efficient fine-tuning (PEFT) methods offer a computationally viable approach to adapting large language models (LLMs) to specific tasks. A new framework, Magnitude and Direction Adaptation for Parameter-efficient tuning (MAP), consistently enhances LLM performance by decoupling the adaptation of model weights into directional and magnitude components.

The core principle of MAP involves treating weight matrices as high-dimensional vectors. Pre-trained weights undergo normalisation, establishing a baseline for subsequent adaptation. This decoupling allows for more granular control during fine-tuning, potentially leading to improved performance and stability. Unlike conventional methods that directly modify weights, MAP adjusts the direction of weight updates independently from their magnitude.

Researchers evaluated MAP’s efficacy using both LLaMA-7B and LLaMA-3-8B models. Systematic hyperparameter optimisation, encompassing Low-Rank Adaptation (LoRA) rank, learning rates, and batch sizes, was conducted to identify optimal configurations. The AdamW optimiser, a variant of stochastic gradient descent, was employed alongside a learning rate warmup strategy – a technique that gradually increases the learning rate at the beginning of training – to stabilise the initial stages and ensure consistent gains.

Performance was assessed using the General Language Understanding Evaluation (GLUE) benchmark, a collection of diverse natural language understanding (NLU) tasks. Results demonstrate that MAP improves performance across a range of tasks, including question answering, textual entailment (determining the logical relationship between sentences), and sentiment analysis. Crucially, the framework mitigates the risk of catastrophic forgetting – a phenomenon where fine-tuning overwrites previously learned knowledge – by carefully modulating the magnitude of weight updates.

The simplicity of MAP facilitates seamless integration with existing PEFT techniques, offering a readily deployable enhancement to current workflows. Future research will focus on the theoretical underpinnings of MAP, specifically its relationship to the geometry of the high-dimensional weight space. Investigations into alternative normalisation strategies and scalar coefficient initialisation methods are also planned, alongside expanded evaluations across a broader range of LLMs and downstream tasks. Adaptive strategies for determining optimal scalar coefficients during training hold particular promise for unlocking further potential within MAP and related PEFT methods.

👉 More information
🗞 MAP: Revisiting Weight Decomposition for Low-Rank Adaptation
🧠 DOI: https://doi.org/10.48550/arXiv.2505.23094

Tags:

Deep Learning directional update DoRA. Large Language Models Lora Machine Learning magnitude scaling parameter-efficient fine-tuning vector normalisation weight adaptation

Quantum News

Efficient AI Fine-tuning: New Method Decouples Direction and Magnitude Adaptation.

Decoupled Weight Adaptation Improves Large Language Model Fine-tuning

Latest Posts by Quantum News:

Samsung Pushes Software-Driven Networks Through NVIDIA Collaboration

OpenAI Announces 3 Key Principles Guiding AI Deployment with the Department of War

RIKEN to Jointly Develop Hybrid Quantum-Classical Computing with Singapore Partners