Experiences Manually Tuning for Parallelism and Data Locality in a Combustion Benchmark
Tuning a scientific application to take advantage of the parallelism available on today's many-core machines while preserving data locality is a major challenge. One approach to balancing the parallelism/locality tradeoff is to try various execution schedules within the parallel portions of the code. The challenges associated with manually exploring this space with a scientific application include rewriting significant portions of the code: loop structures and indices, data storage mapping, and others.
In this talk I will present our experiences doing this with a fourth order 3D combustion code using polyhedral code generation tools. We hope to use the lessons learned during this exercise to enableautomation of this process in the future. We plan to use loop chains to facilitate this work.