The increasing need for high-resolution, real-time climate modelling presents a significant challenge for numerical methods, and researchers are actively seeking solutions that can harness the power of modern high-performance computing systems. Johansell Villalobos from the National High Technology Center, Daniel Caviedes-Voullième from the Jülich Supercomputing Center, and Silvio Rizzi from Argonne National Laboratory, alongside Esteban Meneses from the Costa Rica Technological Institute, have conducted a comprehensive performance evaluation of the SERGHEI-SWE shallow water equations solver across four leading heterogeneous platforms, Frontier, JUWELS Booster, JEDI, and Aurora. Their work demonstrates consistent scalability up to 2048 GPUs, achieving speedups of 32 and efficiencies exceeding 90%, and identifies memory bandwidth as the primary performance limitation. This research establishes SERGHEI-SWE as a robust and portable simulation tool for large-scale geophysical applications, and highlights opportunities for further optimisation to maximise performance on evolving high-performance computing architectures.
SERGHEI Performance Across CPUs, GPUs, FPGAs
This research investigates how well SERGHEI, a solver for shallow water equations, maintains performance when moved between different hardware types, including CPUs, GPUs, and FPGAs. The study assesses performance portability using established metrics and the Kokkos framework, revealing that SERGHEI can run efficiently on diverse architectures, though platform-specific adjustments often improve results. Detailed analysis, including roofline analysis, identifies performance bottlenecks and guides optimization efforts, while auto-tuning features within Kokkos offer potential for further improvement. The research contributes to the growing field of performance portability, demonstrating the potential of frameworks like Kokkos to deploy scientific applications on diverse hardware.
High-Performance Hydrological Simulations on Heterogeneous Systems
Scientists developed SERGHEI-SWE, a solver for shallow water equations, to meet the increasing demand for high-resolution, real-time hydrological simulations. Researchers demonstrated strong scaling up to 1024 GPUs and weak scaling exceeding 2048 GPUs, consistently achieving a speedup of 32 and an efficiency upwards of 90 percent. To achieve performance portability, the team engineered SERGHEI-SWE using the Kokkos framework, enabling GPU acceleration and distributed programming with MPI.
The study employed harmonic and arithmetic mean-based metrics to quantify portability, revealing that SERGHEI-SWE achieves portability with tuned problem sizes up to 70 percent. Roofline analysis identified memory bandwidth as the dominant performance bottleneck, indicating that key solver kernels are limited by data transfer rates. The research team assessed strong scaling up to 1024 GPUs and weak scaling exceeding 2048 GPUs, consistently achieving a speedup of 32 and an efficiency upwards of 90 percent. Detailed analysis, including roofline analysis, revealed that memory bandwidth is the primary performance bottleneck within the solver’s key kernels. Despite this limitation, the SERGHEI-SWE solver exhibits strong performance portability, achieving results with tuned problem sizes in up to 70 percent of tests across diverse hardware architectures. The study confirms that SERGHEI-SWE is a robust and scalable tool for large-scale geophysical applications, particularly relevant for improving flash flood forecasting.
Solver Performance Across Diverse GPU Architectures
This research presents a comprehensive performance analysis of the SERGHEI-SWE shallow water equations solver across four leading high-performance computing systems, each featuring different GPU architectures. Results demonstrate the solver’s efficient parallel performance, achieving strong scaling up to 1024 GPUs and maintaining high scalability in weak scaling tests. The JEDI system and the Frontier system exhibited the fastest execution times, while the Intel Max 1550 and NVIDIA A100 systems showed comparable performance, indicating robust scalability despite architectural differences. Detailed analysis reveals that memory bandwidth consistently limits performance across all platforms. The authors acknowledge that evaluating performance portability requires significant effort and suggest that developing a clear methodology for evaluating performance portability would benefit the field.
👉 More information
🗞 Towards Portability at Scale: A Cross-Architecture Performance Evaluation of a GPU-enabled Shallow Water Solver
🧠 ArXiv: https://arxiv.org/abs/2511.01001
