Low-Rank and Sparse Pre-Training Boosts Performance of Large Language Models

Large language models demonstrate impressive capabilities, but their sheer size demands enormous computational resources for initial training, creating a significant barrier to wider access and innovation. Jiaxi Li and colleagues at Peking University, alongside collaborators, address this challenge with a new pre-training method called LOST, which stands for Low-rank and Sparse Training. This innovative approach cleverly combines techniques that reduce the number of parameters needed, while simultaneously preserving crucial information, allowing large language models to be trained efficiently from scratch. The team’s experiments, ranging from smaller 60 million parameter models to larger 7 billion parameter networks, demonstrate that LOST achieves performance comparable to, or even exceeding, fully-sized models, all while substantially lowering both memory and computational demands.

Low-Rank Sparsity Accelerates Language Model Training

Researchers have developed a new method, called LOST, that significantly improves the efficiency of training large language models (LLMs) from scratch, without sacrificing performance. Traditionally, training these models demands substantial computational resources and memory, but LOST enables effective training even with limited resources. The core innovation lies in a clever combination of low-rank and sparse structures within the model’s parameters. The team’s approach begins by simplifying the complex weight matrices that define the LLM, retaining only the most important information using a mathematical technique called singular value decomposition.

This creates a low-rank representation, dramatically reducing the number of parameters needing to be trained. Crucially, LOST doesn’t simply discard the remaining information; instead, it strategically incorporates it as a sparse component, preserving essential details that would otherwise be lost. This co-design ensures the model retains its capacity for learning and generalization. Experiments demonstrate that LOST achieves performance comparable to, and in some cases exceeding, full-rank training, the standard approach, while using significantly less memory and computational power. For example, the method allows for training models with fewer parameters, reducing the memory footprint without compromising accuracy.

This is a substantial improvement over existing techniques that attempt to compress models, which often suffer from performance degradation. Notably, LOST outperforms other recent methods that combine low-rank and sparse techniques, demonstrating the effectiveness of its complementary design. Previous approaches often treated these components independently, whereas LOST carefully integrates them to maximize performance and efficiency. The team successfully pre-trained LLMs ranging in size from 60 million to 7 billion parameters, demonstrating the scalability and versatility of the method. This advancement promises to democratize access to LLM technology, enabling researchers and developers with limited resources to train powerful language models from scratch. Detailed analyses confirmed the effectiveness of LOST across different model sizes, demonstrating its practical benefits for efficient LLM training. The researchers believe this work represents a significant step forward in the field and provides a strong foundation for future research into efficient large language models.

👉 More information
🗞 LOST: Low-rank and Sparse Pre-training for Large Language Models
🧠 ArXiv: https://arxiv.org/abs/2508.02668

Quantum News

Quantum News

As the Official Quantum Dog (or hound) by role is to dig out the latest nuggets of quantum goodness. There is so much happening right now in the field of technology, whether AI or the march of robots. But Quantum occupies a special space. Quite literally a special space. A Hilbert space infact, haha! Here I try to provide some of the news that might be considered breaking news in the Quantum Computing space.

Latest Posts by Quantum News:

From Big Bang to AI, Unified Dynamics Enables Understanding of Complex Systems

From Big Bang to AI, Unified Dynamics Enables Understanding of Complex Systems

December 20, 2025
Xanadu Fault Tolerant Quantum Algorithms For Cancer Therapy

Xanadu Fault Tolerant Quantum Algorithms For Cancer Therapy

December 20, 2025
NIST Research Opens Path for Molecular Quantum Technologies

NIST Research Opens Path for Molecular Quantum Technologies

December 20, 2025