The conference, organized by the American Institute of Aeronautics and Astronautics, covers all aspects of computational fluid dynamics particularly relevant to aerospace applications, with topics ranging from basic research and development to applied and advanced technology.
Fischer’s presentation, titled “Scaling Limits for PDE-Based Simulation,” addressed several questions. What is the best possible scaling behavior one can expect for a given problem size? Does the performance depend on whether the processors are traditional CPUs, multicore nodes, or accelerators? And can such performance be sustained on an exascale architecture?
“Parallel computing can deliver a multiplicative increase in performance with an increase in the number of processors,” said Fischer. “In fact, in this era where clock rates are no longer increasing, this multiplicative effect, called strong scaling, is the only mechanism we have for increasing the speed of calculations by factors of thousands.”
Despite this fact, most people do not use the entire machine, or even a significant part of the machine, at large high-performance computing centers. The reason lies in the reduction in efficiency.
Using the Nek5000 spectral-element code applied to a thermal-hydraulics flow problem, Fischer described results that he and his colleagues achieved on Mira, the IBM Blue Gene/Q system at Argonne. Specifically, as the number of cores increased from approximately 131,000 to 1 million, the efficiency dropped from unity to 0.6. To determine whether this was the best one could expect, the researchers analyzed several algorithms. Their analysis showed that indeed this result was nearly optimal for Nek5000 on the Blue Gene/Q. Fischer and his colleagues then turned to simulations on graphics processing units (GPUs), which are at the heart of several current and proposed high-performance computing systems. Testing another spectral-element code, NekCEM, on the Cray XK7 using one GPU per node, the researchers again found a strong-scale limit.
“Solution strategies for a computational fluid dynamics simulation at exascale would require about 10 trillion gridpoints to make effective use of the entire machine,” said Fischer. “But that may be overkill for many applications. What we need to do, then, is focus on reducing the problem size per node if we want to reduce the time to solution.”
In his presentation, Fischer mentioned several potential strategies to mitigate the performance drop-off in turbulent flow simulations on future-generation computing systems. These include aggregating shorter messages into a single longer message, selecting a discontinuous (rather than a continuous) Galerkin formulation, and providing hardware support for parallel prefix operations so that more sophisticated solvers can be implemented at speed.
For further information about the conference, see the website http://arc.aiaa.org/doi/book/10.2514/MCFD15.
The paper “Scaling Limits for PDE-Based Simulation,” P. F. Fischer, K. Heisey, and M. Min, AIAA, is available at http://www.mcs.anl.gov/papers/P5347-0515.pdf.