Events section menu
Abstract: Large computing facilities have become a critical resource across many research areas such as high energy physics, climate science, and artificial intelligence. It is therefore essential that our facilities are well designed, configured, and managed for productivity and energy efficiency to meet research and sustainability goals. As with other physical systems and processes, modeling and simulation (modsim) has been instrumental in studying these facilities and the integrated research infrastructure (IRI) in which they operate, allowing us to conduct investigations that are impractical with real world studies. However, modsim of computing facilities and IRIs currently lack the maturity to accurately predict the behavior of these infrastructure at scale with the required fidelity.
This talk explores challenges in modeling and simulating IRI at scale, with specific focus on facilities like ALCF and ESnet. It will discuss limitations in applying current methodologies – such as discrete event and agent-based modeling – to simulate large-scale environments comprising heterogenous components. Finally, we will discuss enhancing modsim methodologies to leverage hybrid modeling strategies and novel simulation infrastructure for more scalable and accurate simulations of IRI operations. These enhancements move us towards more actionable models of these important facilities to support design and operational decision-making.