High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. The CCE Portable Parallelization Strategies (PPS) team is helping to 1) define strategies to prioritize codes to parallelize and 2) determine how to parallelize these codes in a portable fashion so that the same code base can run on multiple architectures with few or no changes. The PPS team will investigate using Kokkos, SYCL, OpenMP, std::execution::par and Alpaka as potential portability solutions using several representative use cases from DUNE, LHC ATLAS and CMS experiments, listed below.
Wirecell: The Wire-Cell Toolkit is a C++ software library for the simulation, signal processing, reconstruction and visualization of Liquid Argon Time Projection Chamber (LArTPC) detectors for neutrino experiments, such as the planned Deep Underground Neutrino Experiments (DUNE). It follows the data-flow programming paradigm and has a modular design which allows us to investigate portable parallelization strategies for different components of the workflow separately. We are currently working on the LArTPC signal simulation module in Wire-Cell, and will add signal processing to our investigations in the future. We first ported the key computational kernels to NVIDIA GPUs using CUDA as a baseline, and have since implemented key simulation steps in Kokkos such that the same code can run on multi/many-core CPUs, NVIDIA GPUs and AMD GPUs.
FastCaloSim: FastCaloSim is a component of the ATLAS Fast Calorimeter Simulation toolchain. It performs a parameterized simulation of the Liquid Argon Calorimeter instead of a full Geant4 based particle simulation. A significant fraction of this component’s workload can be executed on a GPU, and its small code size and standalone nature make it a prime testbed for code portability studies. FastCaloSim has been ported to CUDA, Kokkos, and SYCL. Kokkos backends on NVIDIA, AMD and Intel GPUs, as well as host parallel devices, have all been benchmarked. An std::par version is currently being developed.
ACTS: ACTS (A Common Tracking Software) is a detector and experiment independent particle tracking toolkit for High Energy Physics. It provides high-level track reconstruction modules that can be used for any tracking detector, allowing different experiments to share a common code base, needing only to specialize detector and magnetic field descriptions. A workflow that is intended to run end-to-end on GPUs is being developed, which links modules that perform hit clustering, track seeding, and track following. Currently only CUDA and some SYCL versions of the modules exist, but we envision porting the full toolchain to Kokkos.
Patatrack: The CMS Heterogeneous Pixel Reconstruction code pioneered offloading computations to GPUs in CMS data processing software (CMSSW). It was designed to do nearly all of the work from decoding the raw data of CMS’ pixel detector up to track and vertex reconstruction on a GPU. In this project the heterogeneous pixel reconstruction code, written originally in CUDA, was extracted into a standalone package and is used as a testbed for exploring code portability technologies. The corresponding code in CMSSW will be used at the CMS High Level Trigger in Run 3, and the standalone package closely mimics the most important aspects of CMSSW like the behavior of the framework and the build system. The pixel reconstruction code has been ported to Kokkos and HIP, and various versions have been benchmarked on x86 CPU and NVIDIA GPUs. Currently the performance of the Kokkos version is being improved, and getting the code to run on AMD GPUs via HIP and Kokkos is being worked on. We have also contributed to an Alpaka version of the code.
P2R: P2R (Propagate-to-r) is a standalone, light-weight mini app that performs the core computations of Kalman Filter-based track fitting. P2R is originally extracted from the fully vectorized CPU tracking application, called mkFit. With standalone input and simple program structure, it is currently the smallest code base among CCE-PPS applications which facilitates its porting into different portability solutions. Current implementations focus on comparison of offloading to NVIDIA GPUs via different technologies, such as CUDA, OpenACC, and std::par, with the plan to extend to other offloading targets and portability solution in near future.
RNGs: Following the FastCaloSim rewrite using the SYCL programming model, it was possible to target multiple vendors’ platforms using the DPC++ and hipSYCL compiler toolchains. This motivated extending the oneMKL open-source interfaces library -- used for generating random numbers in the SYCL-based implementation of FastCaloSim -- to support cuRAND and hipRAND through interoperability for execution on NVIDIA and AMD devices, respectively.