As high-performance computing (HPC) continues to push the boundaries of processing power and memory capacity, researchers are delving deeper into the intricacies of system performance. In a recent study, experts explored the peculiar breakdowns in performance observed when running the CloverLeaf benchmark on Intel’s Ice Lake and Sapphire Rapids server CPUs. The culprit? A newly introduced write-allocate evasion feature called SpecI2M. Dive into this fascinating investigation to discover how first-principles modeling can help optimize HPC systems and uncover the mysteries of prime number effects.
Can Intel’s New Write-Allocate Evasion Feature Be Tamed?
The article delves into the performance study of the CloverLeaf benchmark, a Lagrangian-Eulerian hydrodynamics miniapp, on Intel’s Ice Lake and Sapphire Rapids server CPUs. The researchers observed peculiar breakdowns in performance when the number of processes was prime, which they attributed to a new write-allocate evasion feature called SpecI2M.
Understanding the SPEChpc 2021 Benchmark Suite
The SPEChpc 2021 benchmark suite was specifically designed for state-of-the-art HPC systems utilizing high parallelism. Its current version 11 was released in July 2022, aiming to address challenges of real-world applications with different sizes of workloads and provide comparative performance metrics for both CPU and GPU runs. The suite supports OpenACC, OpenMP, and MPI, making it a comprehensive tool for evaluating HPC systems.
The CloverLeaf benchmark is part of the SPEChpc 2021 suite, developed as part of the Mantevo project. It’s a Lagrangian-Eulerian hydrodynamics miniapp that represents a significant portion of the overall code. The researchers conducted a performance study of the pure MPI version of the CloverLeaf benchmark on Intel’s Ice Lake SP ICX server hardware platform.
Unraveling the Mystery of Prime Number Effects
The researchers observed peculiar breakdowns in performance when the number of processes was prime, which they attributed to the newly introduced write-allocate evasion feature SpecI2M. They created first-principles data traffic models for each of the stencillike hotspot loops and applied application measurements and microbenchmarks to study memory data traffic behavior.
The analysis revealed that if the number of processes is prime, SpecI2M fails to work properly, which can be attributed to short inner loops emerging from the one-dimensional domain decomposition in this case. The researchers ruled out other possible causes of the prime number effect, such as breaking layer conditions, MPI communication overhead, and load imbalance.
Predicting Memory Data Volume with Analytical Models
For serial and full-node cases, the researchers were able to predict the memory data volume analytically with an error of a few percent. This achievement demonstrates the power of first-principles modeling in understanding complex systems like HPC platforms.
The study highlights the importance of considering the interactions between different components of a system, such as CPU architecture and memory hierarchy, to accurately predict performance. The findings also underscore the need for careful tuning of write-allocate evasion features like SpecI2M to ensure optimal performance in various scenarios.
In conclusion, this study demonstrates the value of first-principles modeling in understanding complex systems like HPC platforms. By analyzing the CloverLeaf benchmark on Intel’s Ice Lake and Sapphire Rapids server CPUs, the researchers uncovered the impact of the newly introduced write-allocate evasion feature SpecI2M on performance. The findings provide valuable insights for optimizing HPC systems and highlight the importance of considering the interactions between different components of a system
Future studies could explore the application of these analytical models to other benchmarks and workloads, as well as investigate the impact of other write-allocate evasion features on performance. Additionally, researchers could examine the effects of SpecI2M on other types of applications, such as those with different memory access patterns or communication overheads.
Publication details: “CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion”
Publication Date: 2024-05-27
Source:
DOI: https://doi.org/10.1109/ipdps57955.2024.00038
