Argonne National Laboratory has access to three main facilities for computing resources; data and networking; and data analytics and visualization.
Joint Laboratory for System Evaluation
The Joint Laboratory for System Evaluation (JLSE) is a collaboration between the CELS computing divisions including Leadership Computing Facility (LCF), Mathematics and Computer Science (MCS), Computational Science (CPS) and Data Science and Learning (DSL) with the aim of evaluating future high-performance computing platforms, developing system software and measuring power/energy. JLSE hosts more than two dozen different cutting-edge hardware platforms, including Intel development GPU cards (code names XeHP and DG1), as well as NVIDIA A100 and RTX8000 cards. The post-Moore architecture lab in MCS, which is part of JLSE, hosts an FPGA testbed.
Argonne Leadership Computing Facility
Argonne Leadership Computing Facility (ALCF) resources include leadership-class supercomputers, visualization clusters, advanced data storage systems, high-performance networking capabilities, and a wide variety of software tools and services to help facility users achieve their science goals.
- Theta/ThetaGPU. Theta is an 11.7-petaflops supercomputer based on Intel processors and interconnect technology, an advanced memory architecture, and a Lustre-based parallel file system, all integrated by Cray’s HPC software stack. ThetaGPU is an NVIDIA DGX A100-based system. providing a total of 320 gigabytes of memory for training AI datasets, as well as high-speed NVIDIA Mellanox ConnectX-6 network interfaces.
- Cooley, the ALCF’s visualization cluster, enables users to analyze and visualize large-scale datasets. Equipped with state-of-the-art graphics processing units (GPUs), Cooley helps users gain deeper insights into simulations and data generated on the facility’s supercomputers.
- The Argonne AI-Testbed provides an infrastructure of next-generation AI-accelerator machines. It aims to help evaluate usability and performance of machine learning based high-performance computing applications running on these accelerators. Currently, allocations are offered on Cerebras CS-2 and SambaNova DataScale systems. See https://ai.alcf.anl.gov/#systems.
- Polaris. A 44-petaflop peak performance CPU/GPU hybrid resource, Polaris was developed in collaboration with Hewlett Packard Enterprise (HPE). It provides a platform to test and optimize codes for Argonne’s upcoming Aurora exascale supercomputer.
- Aurora. Installation of Aurora, Argonne’s first exascale computer, began at Argonne in mid-2022. Aurora combines more than 10,000 Intel-outfitted blades into an HPE Cray EX supercomputer. Each compute blade has two Sapphire Rapids Xeon CPUs and six Ponte Vecchio GPUs, integrated into HPE’s Cray EX architecture with Slingshot networking. When fully deployed, Aurora is projected to have a peak performance of more than 2 exaflops. Aurora’s revolutionary architecture will support machine learning and data science workloads alongside traditional modeling and simulation workloads.
- Currently, the ALCF provides allocations on its resources, based on competitive proposals, through the INCITE program, the ASCR Leadership Computing Challenge, the ALCF Director’s Discretionary program, the ALCF Data Science Program, and the Aurora Early Science Program.
Laboratory Computing Resource Center
Apart from the ALCF facilities, Argonne also hosts several other resources in the Laboratory Computing Resource Center (LCRC). Bebop is the newest addition to the computational power of LCRC. It has 1,024 public nodes, with 128 GB (Intel Broadwell) / 96 GB Intel Knights Landing) of memory on each node. Blues, the second computing cluster, has approximately 350 public nodes, with 64 GB (Intel Sandy Bridge)/128 GB (Intel Haswell) of memory on each node. The LCRC resources are available to Argonne researchers and their collaborators through a simple internal proposal process.