Large language models demonstrate impressive capabilities, but their sheer size demands enormous computational resources for initial training, creating a significant barrier to wider access and innovation. Jiaxi Li and colleagues at Peking University, alongside collaborators, address this challenge with a new pre-training method called LOST, which stands for Low-rank and Sparse Training. This innovative approach cleverly combines techniques that reduce the number of parameters needed, while simultaneously preserving crucial information, allowing large language models to be trained efficiently from scratch. The team’s experiments, ranging from smaller 60 million parameter models to larger 7 billion parameter networks, demonstrate that LOST achieves performance comparable to, or even exceeding, fully-sized models, all while substantially lowering both memory and computational demands.

Low-Rank Sparsity Accelerates Language Model Training

Researchers have developed a new method, called LOST, that significantly improves the efficiency of training large language models (LLMs) from scratch, without sacrificing performance. Traditionally, training these models demands substantial computational resources and memory, but LOST enables effective training even with limited resources. The core innovation lies in a clever combination of low-rank and sparse structures within the model’s parameters. The team’s approach begins by simplifying the complex weight matrices that define the LLM, retaining only the most important information using a mathematical technique called singular value decomposition.

This creates a low-rank representation, dramatically reducing the number of parameters needing to be trained. Crucially, LOST doesn’t simply discard the remaining information; instead, it strategically incorporates it as a sparse component, preserving essential details that would otherwise be lost. This co-design ensures the model retains its capacity for learning and generalization. Experiments demonstrate that LOST achieves performance comparable to, and in some cases exceeding, full-rank training, the standard approach, while using significantly less memory and computational power. For example, the method allows for training models with fewer parameters, reducing the memory footprint without compromising accuracy.

This is a substantial improvement over existing techniques that attempt to compress models, which often suffer from performance degradation. Notably, LOST outperforms other recent methods that combine low-rank and sparse techniques, demonstrating the effectiveness of its complementary design. Previous approaches often treated these components independently, whereas LOST carefully integrates them to maximize performance and efficiency. The team successfully pre-trained LLMs ranging in size from 60 million to 7 billion parameters, demonstrating the scalability and versatility of the method. This advancement promises to democratize access to LLM technology, enabling researchers and developers with limited resources to train powerful language models from scratch. Detailed analyses confirmed the effectiveness of LOST across different model sizes, demonstrating its practical benefits for efficient LLM training. The researchers believe this work represents a significant step forward in the field and provides a strong foundation for future research into efficient large language models.

👉 More information
🗞 LOST: Low-rank and Sparse Pre-training for Large Language Models
🧠 ArXiv: https://arxiv.org/abs/2508.02668

Tags:

compute overhead Large Language Models low-rank and sparse training low-rank parameterization memory overhead pre-training Singular Value Decomposition Sparse training

Quantum News

Low-Rank and Sparse Pre-Training Boosts Performance of Large Language Models

Low-Rank Sparsity Accelerates Language Model Training

Latest Posts by Quantum News:

PsiQuantum and National Cancer Center Japan Partner to Advance Cancer Treatment Research

Photonic Inc. Appoints New CEO, Chief Product Officer to Drive Commercial Growth

Bain & Company and IBM Address Emerging Cybersecurity Risks for Clients