The Argonne Leadership Computing Facility (ALCF) is empowering a diverse cohort of users—spanning institutions like the University of Chicago and collaborators across the Department of Energy—with access to the newly deployed Aurora supercomputer. This exascale system, featuring Intel Data Center GPU Max Series accelerators and a Cray Slingshot-11 network, delivers peak performance exceeding 1 exaflop. The ALCF’s Polaris and Sophia testbeds, alongside the broader Storage and Networking Facility, facilitate evaluation of emerging technologies and architectures, advancing scientific discovery in fields ranging from artificial intelligence to materials science. This capability represents a significant leap in computational power for open scientific research.
Leadership Computing Resources & Facility Capabilities
The Argonne Leadership Computing Facility (ALCF) distinguishes itself by offering computational power far exceeding standard research systems. Currently, the flagship system, Aurora, boasts over 80,000 Intel Xeon CPU cores and 60,000 GPUs, delivering peak performance exceeding 2 exaflops. This immense capability allows researchers to tackle previously intractable problems in fields like materials science, astrophysics, and climate modeling – accelerating discovery through massively parallel simulations and data analysis.
Beyond raw compute, the ALCF prioritizes robust data storage and networking. Dedicated high-bandwidth InfiniBand networks connect compute resources with the Alliant Data Storage system, providing over 10 EB of storage capacity and sustained performance exceeding 1 TB/s. Such infrastructure isn’t just about size; it’s crucial for handling the massive datasets generated by leadership-class simulations, enabling real-time analysis and iterative refinement of scientific models.
ALCF’s capabilities extend beyond hardware with specialized testbeds like Polaris and Sophia, alongside the AI Testbed. Crux provides a dedicated environment for developing and deploying data-intensive applications. Facility expertise includes performance analysis, software optimization, and workflow design. This holistic approach – combining cutting-edge hardware and dedicated support – is key to maximizing scientific output and enabling users to fully leverage these powerful resources.
ALCF Testbeds: Evaluation & System Access
The ALCF (Argonne Leadership Computing Facility) Testbeds program is critical for evaluating next-generation hardware before it’s deployed on production systems like Aurora. Currently, testbeds include Polaris, a 288-node cluster featuring AMD EPYC 7003 series processors and NVIDIA A100 GPUs, and Sophia, focused on persistent memory technologies. Rigorous benchmarking—measuring performance on representative scientific workloads—identifies potential bottlenecks and informs optimal system configuration. This proactive approach minimizes risk and maximizes scientific output when full-scale systems are available to a broader user base.
Access to ALCF Testbeds isn’t simply about getting early hardware; it’s a collaborative evaluation process. Researchers apply for dedicated time, often working directly with ALCF staff and vendor engineers. Current efforts are heavily focused on the ALCF AI Testbed, leveraging 80 NVIDIA H100 GPUs interconnected with 400Gb/s InfiniBand. Initial results demonstrate up to a 3x performance increase for large language model training compared to previous generation hardware, crucial for advancements in fields like materials science and drug discovery.
Beyond compute power, ALCF’s Storage and Networking Facility is a key testbed component. Crux, a Lustre filesystem capable of over 200 GB/s throughput, is actively being evaluated alongside emerging technologies like computational storage. Understanding how these systems integrate with leading-edge processors and GPUs—and optimizing data movement—is paramount. Efficient data handling directly translates to faster simulations and analyses, unlocking new possibilities for data-intensive scientific research.
