Flash-X: A New Approach to Performance Portability

Gail Pieper

September 13, 2023

Researchers from Argonne National Laboratory, the University of Chicago and the Riken Center for Computational Science, Japan, have now developed a tool that expresses code by abstractions that minimize code duplication.

Flash-X: A New Approach to Performance Portability image — Fig. 1: Highly simplified example with embedded macros to generate different codes for CPU and GPU through the use of alternative definitions.

The growing heterogeneity of high-performance computing (HPC) platforms, coupled with the growing complexity of computer codes, has made performance portability an increasingly challenging task.

The group of researcher’s findings have been published in a special issue of the journal Future Generation Computer Systems.

The papers selected for this special issue are extended versions of highly rated papers presented at the 14^th International Conference on Parallel Processing and Applied Mathematics, held in Gdansk, Poland, in September 2022

Why abstractions?

Computational scientists like abstractions because they avoid the need for separate software implementations for each computing platform. Most approaches to date, however, have relied on C++, leaving non-C++ codes in the lurch.

“We’ve developed a language-agnostic mechanism that uses macros for abstracting multiple alternative definitions specialized for different targets,” said Anshu Dubey, a senior computational scientist in the Mathematics and Computer Science division at Argonne. “With this new mechanism, software developers avoid having to substantially rewrite their code to accommodate differences as new hardware platforms are introduced” (see Fig. 1).

Macros have been widely used in software development. Traditional macros, however, lack features needed to use the same expression for different targets.

“The key is to enable the macros to have multiple alternative definitions specialized for these targets – and to provide a mechanism to decide which definition to use where,” Dubey said.

A strong motivator

The specific motivator for this research was the rearchitecting of the FLASH astrophysical code. Written largely in Fortran, FLASH comprises multiple components, each having two or three alternative implementations for different fidelity or different scales. But the increasing heterogeneity of HPC platforms and the advent of accelerators made it clear that many more alternative implementations would be needed just for FLASH to deal with hardware differences.

Rewriting all of FLASH in a domain-specific language or in a new HPC language such as Julia would have involved an enormous effort. Instead, the research team chose a different path for the new Flash-X code: a customized macroprocessor that permits multiple alternative definitions of macros and an arbitration mechanism that selects the best definition for expanding the macros. The approach unifies the implementation variants into a single maintained codebase.

Another feature of the new approach is the decomposition of the code into code “snippets.” The snippets separate the arithmetic of the computation from the logic of the control flow, so that the code components become building blocks that can be combined in many different ways.

But is it less work than a rewrite?

“Someone might well ask whether preparing these snippets and writing different definitions for different macros is really simpler than a complete rewrite,” Dubey said. “It is.”

She noted that one can convert any code incrementally by virtue of having complete control on which code snippets to replace with macros and how many. Moreover, the macroprocessor can be applied to any code component as a stand-alone tool to generate the corresponding file in the target language of the code that can be debugged more easily than code written with C++ based abstractions.

For details about the research, see A. Dubey, Y. Lee, T. Klosterman, and E. Vatai, “A tool and a methodology to use macros for abstracting variations in code for different computational demands,” in Future Generation Computer Systems, R. Wyrzykowski and E. Deciman (Eds.), July 2023.