IWOMP, held virtually Sept. 14–16, 2021, is the premier forum for presenting and discussing issues, trends, recent research ideas and results related to parallel programming with OpenMP.
In their award-winning paper titled “A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation,” the researchers present a methodology for using LLVM tools to tune an application on the new ARM A64FX processor,
“Applications developers increasingly must tune or even restructure their codes in order to exploit new hardware devices and to port applications to new platforms,” said Johannes Doerfert, an assistant computer scientist in Argonne’s Mathematics and Computer Science division and one of the authors of the award-winning paper. “Our goal in this study was to determine which application loops are not being vectorized and suggest ways to improve the application’s performance.”
The researchers studied a cutting-edge quantum Monte Carlo application called the DCA++ (dynamical cluster approximation) algorithm, which is used to solve quantum many-body problems in condensed matter physics. Specifically, they focused on hot spots where loops could benefit from further optimization.
In some cases, a simple change such as a different vectorization flag was sufficient; in other cases, the researchers transformed the code or applied an OpenMP directive. They also ensured that the correct libraries were used to achieve optimal performance. With these code changes, code speed was increased by 1.98× on the A64FX processor.
The team is now developing OpenMP Advisor as part of the LLVM framework. The aim is to automate part of the optimization process, thus enhancing user productivity.
The workshop paper is available at https://arxiv.org/abs/2106.14332: Joseph Huber, Weile Wei, Giorgis Georgakoudis, Johannes Doerfert and Oscar Hernandez, A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation. For the slides presented at the workshop, click here.