Understanding fluid flow within the Earth’s mantle presents a significant challenge in geodynamics, and researchers continually seek more efficient computational methods to model these complex processes. Marcel Ferrari, Cyrill Püntener, Alexander Sotoudeh, and Niklas Viebig, from ETH Zurich, have developed a new approach to solving the equations governing fluid motion with variable viscosity, a key characteristic of the Earth’s interior. Their work focuses on optimising the computational kernels used in these simulations, specifically through a technique called ‘temporal blocking’ which improves how data is accessed and reused within the computer’s memory. The team’s results demonstrate substantial performance gains, with their optimised method achieving up to a threefold speedup compared to standard approaches, and paving the way for more detailed and accurate modelling of Earth’s dynamic processes and potentially extending to even more complex three-dimensional simulations.
Iterative Solver Performance on Large Grids
This study presents performance data for several iterative solvers, Red-Black Gauss-Seidel, Jacobi, fused Jacobi, blocked fused Jacobi, and RAS Jacobi, when applied to a computational problem simulating two-dimensional fluid flow. The research investigates how these solvers perform on grids of varying sizes, 2000×2000 and 8000×8000, and with different numbers of processing threads, ranging from one to 256. Increasing the number of threads generally reduces execution time, as the problem is divided among multiple processors, though this improvement plateaus due to overhead from managing threads and sharing resources. Larger grid sizes naturally require more computational effort and therefore take longer to solve.
Red-Black Gauss-Seidel performs well with a moderate number of threads, scaling reasonably as more processors are added. Fused Jacobi offers some improvement over standard Jacobi, while blocked fused Jacobi consistently delivers strong performance, particularly on larger grids. Measurements of speedup and efficiency quantify the benefits of using multiple threads, though diminishing returns are observed as the number of threads increases. Blocked fused Jacobi consistently achieves the lowest execution times on the 8000×8000 grid, with differences between solvers less pronounced on the smaller 2000×2000 grid. This research provides valuable insights into the performance characteristics of different iterative solvers for simulating two-dimensional fluid flow, enabling researchers to select the most efficient solver and optimize its performance for specific problems.
RAS-Jacobi Smoother for Variable Viscosity Flows
Scientists have developed optimised numerical methods to accelerate simulations of geophysical flows with varying viscosity, a crucial aspect of modelling Earth’s dynamic processes. This research focuses on improving multigrid smoothing within the incompressible Stokes equations by adapting a RAS-type temporal blocking strategy, traditionally used in distributed-memory computing, to enhance cache reuse and scalability within a single computer node. Researchers implemented and evaluated five distinct smoother variants, Red-Black Gauss-Seidel, Jacobi, fused Jacobi, blocked fused Jacobi, and the innovative RAS-Jacobi, using both Python, accelerated with the Numba compiler, and C++. To ensure accurate comparisons, the team introduced an energy-based residual norm that balances velocity and pressure contributions.
All implementations were rigorously validated using a high-contrast sinker benchmark, demonstrating stable convergence across all smoother variants. Performance was assessed through strong and weak scaling experiments conducted on NVIDIA GH200 Grace Hopper nodes of the ALPS supercomputer, revealing that the RAS-Jacobi smoother consistently achieved the best performance, sustaining over 90% weak-scaling efficiency up to 64 cores and delivering up to a threefold speedup compared to the classic C++ Jacobi baseline. This innovative methodology, combining advanced numerical techniques with high-performance computing, lays the groundwork for more accurate and efficient modelling of complex geodynamic phenomena.
RAS-Jacobi Smoothing Optimizes Stokes Equation Solvers
Scientists have achieved a breakthrough in computational geophysics through the design and implementation of optimised numerical methods for solving the incompressible Stokes equations, crucial for modelling complex fluid flow. This work introduces a novel approach to multigrid smoothing, focusing on enhancing performance within modern supercomputing architectures. The team investigated five distinct smoother variants, Red-Black Gauss-Seidel, Jacobi, fused Jacobi, blocked fused Jacobi, and a new RAS-Jacobi smoother, to determine the most efficient method for velocity smoothing. Experiments revealed that the RAS-Jacobi smoother consistently delivered the best performance, particularly at higher core counts, sustaining over 90% weak-scaling efficiency up to 64 cores on NVIDIA GH200 Grace Hopper nodes of the ALPS supercomputer.
This smoother achieved up to a threefold speedup compared to a standard C++ Jacobi baseline, attributed to enhanced cache reuse and reduced memory traffic. Researchers introduced an energy-based residual norm and validated all implementations using a high-contrast sinker benchmark representative of realistic geodynamic numerical models. This breakthrough delivers a significant advancement in computational efficiency for modelling complex geophysical phenomena, paving the way for more detailed and accurate simulations of Earth’s dynamic processes. This work highlights the importance of cache-aware numerical design for harnessing modern heterogeneous architectures.
RAS-Jacobi Smoothing Accelerates Geophysical Flow Simulations
This research presents advances in computational methods for modelling geophysical flows, specifically addressing the challenges of solving the incompressible Stokes equations with variable viscosity. Scientists have designed, implemented, and evaluated optimised numerical methods for multigrid smoothing, investigating five distinct smoother variants. A novel approach, the RAS-Jacobi smoother, consistently outperforms traditional methods, achieving up to a threefold speedup compared to a standard Jacobi baseline, stemming from enhanced cache reuse and reduced memory traffic. The team validated their approach using a high-contrast sinker benchmark and introduced an energy-based residual norm to ensure reliable convergence assessment. Performance studies demonstrate strong scaling efficiency, with the RAS-Jacobi smoother sustaining over 90% weak-scaling efficiency up to 64 cores. This work highlights promising avenues for future research, including extending the RAS-type temporal blocking strategy to three-dimensional problems and exploring its application to GPU accelerators.
👉 More information
🗞 3D Blocking for Matrix-free Smoothers in 2D Variable-Viscosity Stokes Equations with Applications to Geodynamics
🧠 ArXiv: https://arxiv.org/abs/2509.19061
