Benchmarking computational biology software across diverse high performance computing hardware reveals no single optimal configuration, with GPUs proving most efficient for many tasks but CPUs remaining necessary. Data generation rates of 10GB daily strain storage capacity, while increasing system complexity demands enhanced DevOps support and user configurable software environments.
Computational biology increasingly relies on high-performance computing (HPC) to simulate and analyse complex biomolecular systems, demanding ever greater computational resources and sophisticated infrastructure. This presents a significant challenge, as the diverse range of software packages – including molecular dynamics codes like GROMACS and AMBER, and single-particle analysis tools such as RELION – exhibit varying performance characteristics across different hardware architectures.
A recent investigation, conducted by Robert Welch, Charles Laughton, Oliver Henrich, Tom Burnley, Daniel Cole, Alan Real, James Gebbie-Rayet and colleagues, benchmarks a suite of commonly used computational biology software on a range of HPC facilities, encompassing both central processing units (CPUs) and graphics processing units (GPUs), including the AMD EPYC 7742, NVIDIA V100, AMD MI250X and GH200 testbeds. Their work, titled ‘Engineering Supercomputing Platforms for Biomolecular Applications’, assesses not only raw performance and power efficiency, but also the practical considerations of data storage, software deployment and ongoing system maintenance, concluding that a heterogeneous hardware approach is essential to effectively support the breadth of methods employed in modern biomolecular research.
Researchers at the Charles Darwin Centre have undertaken a comprehensive benchmarking assessment of seven prominent molecular dynamics software packages—GROMACS, AMBER, NAMD, LAMMPS, OpenMM, Psi4, and RELION—across a range of high-performance computing (HPC) architectures. The study reveals considerable variations in performance contingent upon both the software employed and the underlying hardware configuration. Molecular dynamics simulates the physical movements of atoms and molecules, crucial for understanding biological systems.
The evaluation encompasses raw computational performance, power efficiency, and the demands placed on data storage, utilising AMD EPYC 7742 central processing unit (CPU) nodes, Nvidia V100 and AMD MI250X graphics processing unit (GPU) nodes, and a GH200 testbed. Results confirm that no single hardware configuration provides optimal support for the breadth of methods used in computational biology, necessitating a diverse portfolio of HPC resources. The GH200 is a combined CPU-GPU architecture developed by AMD.
GPU acceleration consistently delivers superior performance for the majority of computational tasks, although CPUs remain essential for specific applications requiring serial processing or benefiting from their larger cache sizes. Maintaining operational HPC facilities, however, requires substantial ongoing effort, including software updates, hardware maintenance, and skilled personnel. Performance scaling tests demonstrate that simply increasing the number of processors or GPUs does not always yield proportional gains, highlighting the importance of parallelization efficiency—how effectively a task is divided and executed across multiple processors—within each software package.
Researchers confirm that newer hardware, such as AMD GPUs and emerging artificial intelligence (AI) chips, generally exhibits compatibility with existing computational methods. However, supporting these newer architectures demands increased labour and specialised expertise. A critical gap in both short-term and long-term data storage capacity exists within many research institutions and HPC facilities, posing a growing challenge to the continued advancement of computational biology.
The rapidly increasing volume of data generated—a single high-performance computing core now produces approximately 10 terabytes of data per day—necessitates a holistic approach to HPC system design. This approach must consider not only computational performance but also power efficiency, data storage, and ease of maintenance. Investment in DevOps practices—a set of practices that combines software development and IT operations—expanded consortium support for system administrators, and the adoption of build frameworks, containerization, and virtualization tools can empower users to configure their own environments and reduce reliance on centralized installations.
👉 More information
🗞 Engineering Supercomputing Platforms for Biomolecular Applications
🧠 DOI: https://doi.org/10.48550/arXiv.2506.15585
