Numerical Methods on GPUs: Analysis, Challenges, Fine-Tuning
Abstract: While using general-purpose graphics processing units (GPUs) to accelerate the performance of a numerical method, we often need to re-engineer both the algorithm and the implementation strategy. In my talk, I will show two approaches to improving performance.
I will first discuss Krylov subspace linear system solvers and optimizing preconditioners through the development of an algorithm that allows us to take advantage of the GPU single-instruction multiple-data multilevel threaded parallelism.
In the second part of the talk, I will discuss code transformation-based optimization for high-order finite-element methods, including optimizing gradient volume kernel for two-dimensional hexagonal elements and fine-tuning BP1.0, BP3.5, and BP3.0 benchmark problems (CEED benchmarks). I will introduce empirical roofline models and show a detailed performance analysis of the tuning of the benchmark implementations.
Acknowledgment: The research regarding Krylov subspace solvers was developed as a part of the AFOSR BRI project. The research regarding performance optimization for high-order finite-element methods was developed under the DOE ECP-funded CEED project.