Predicting the three-dimensional structure of proteins remains a central challenge in computational biology, crucial for understanding biological processes and designing new therapeutics. Recent advances, notably exemplified by models such as AlphaFold3, have significantly improved prediction accuracy, but at a considerable computational cost. Researchers now demonstrate a system-level optimisation, MegaFold, designed to address these scalability limitations.
Hoa La from the University of Massachusetts, Amherst, Ahan Gupta and Minjia Zhang from the University of Illinois Urbana-Champaign, Alex Morehead from Lawrence Berkeley National Laboratory, and Jianlin Cheng from the University of Missouri, detail their work in a forthcoming publication titled ‘MegaFold: System-Level Optimizations for Accelerating Protein Structure Prediction Models’. Their system focuses on reducing both memory usage and training time through techniques such as data caching, optimised kernels for heterogeneous hardware, and operator fusion, ultimately enabling the training of models on longer protein sequences.
MegaFold demonstrably accelerates the training of the AlphaFold3 (AF3) protein folding model, addressing limitations imposed by its substantial computational and memory demands. The system achieves performance gains through a combination of techniques focused on optimising data handling, kernel implementation, and operator fusion, ultimately expanding the possibilities for structural biology research.
The system meticulously optimises data handling processes to minimise GPU idle time, a critical factor in accelerating training runs. MegaFold employs ahead-of-time caching, a technique that proactively loads necessary data into GPU memory, thereby reducing delays associated with data retrieval during the training process. This contrasts with traditional methods, where data is loaded on demand, creating bottlenecks during computation.
At the heart of MegaFold’s acceleration lies the implementation of Triton-based kernels, a programming language specifically designed for writing high-performance code on GPUs. These kernels facilitate memory-efficient EvoAttention, a key component of AF3 that assesses evolutionary relationships between protein sequences across diverse hardware platforms, including both NVIDIA and AMD GPUs. EvoAttention relies on comparing sequences to identify conserved patterns, and efficient implementation is crucial for performance.
Furthermore, the system implements deep fusion of small, critical operators within AF3, reducing computational overhead and streamlining the training process. This technique combines multiple operations into a single, more efficient operation, minimising the number of kernel launches and reducing communication overhead between the CPU and GPU. Kernel launches are computationally expensive, and reducing their frequency significantly improves performance.
Evaluation on both H200 and AMD MI250 GPUs confirms substantial improvements in performance and efficiency. MegaFold reduces peak memory usage during AF3 training by up to 1.23 times and improves per-iteration training time by up to 1.73 times and 1.62 times, respectively, demonstrating a significant reduction in computational resources and time. These gains are particularly important given the increasing size and complexity of biomolecular systems being modelled.
These gains enable the training of models with increased sequence lengths, a measure of the protein’s size and complexity, allowing researchers to investigate larger and more intricate biological systems. Notably, MegaFold extends the feasible sequence length for training by 1.35x compared to standard PyTorch implementations, without encountering out-of-memory errors. This enables the modelling of proteins that were previously inaccessible due to memory constraints.
The open-source release of MegaFold’s code provides the research community with a valuable tool for accelerating protein folding research, fostering collaboration and innovation in the field. Researchers meticulously documented the system’s architecture and implementation, enabling others to integrate and adapt it to their own research projects easily. This commitment to open science promotes reproducibility and accelerates progress.
Future work will likely focus on extending these optimisations to other large-scale biomolecular modelling tasks, broadening the impact of MegaFold beyond protein structure prediction. Researchers plan to explore further hardware-specific tuning to maximise performance across a wider range of computational platforms. This includes optimising code for specific GPU architectures and memory hierarchies.
Researchers are actively exploring the application of these techniques to other areas of biomolecular modelling, such as molecular dynamics simulations and drug discovery. By leveraging the power of MegaFold, they aim to accelerate the development of new therapies and improve our understanding of biological processes. Molecular dynamics simulations, for example, require substantial computational resources to simulate the movement of atoms and molecules over time.
Researchers meticulously designed MegaFold to be both efficient and versatile, allowing it to adapt to a wide range of computational environments. The system supports both NVIDIA and AMD GPUs, providing flexibility for researchers with varying hardware resources. This broad compatibility ensures that a wider range of researchers can benefit from the system’s performance gains.
The development of MegaFold represents a significant advancement in the field of biomolecular modelling, pushing the boundaries of what is possible in protein structure prediction. By addressing the computational bottlenecks that have traditionally limited the scale and complexity of biomolecular simulations, MegaFold empowers researchers to investigate increasingly complex biological systems.
👉 More information
🗞 MegaFold: System-Level Optimizations for Accelerating Protein Structure Prediction Models
🧠 DOI: https://doi.org/10.48550/arXiv.2506.20686
