Abstract: MPI and OpenMP are two dominant parallel runtimes in HPC. MPI uses distributed- process environment that scales across multiple nodes, but its fixed parallel environment is inflexible to program; OpenMP can easily create dynamic parallel regions, but it is limited to on-node programming. MPI uses private address space that is immune to data racing and false sharing; OpenMP uses shared memory, which is convenient but ridden with pitfalls. MPI provides rich synchronization APIs for users to optimize data movement; OpenMP relies on parallel region boundaries for bulk synchronization.
Can we combine MPI and OpenMP by picking the good parts from each runtime to complement each other? Unfortunately, MPI+Threads fails to do that. In MPI+Threads, MPI and OpenMP remain separate. MPI cannot access OpenMP’s parallel regions, and OpenMP’s parallel region cannot access MPI’s message-passing APIs.
Instead of MPI+Threads, we propose MPI x(multiply) Threads, where OpenMP’s parallel regions are used to expand MPI’s parallel environment. This is enabled by a new feature in MPICH that enables users to create a thread communicator inside OpenMP’s parallel regions, in which each thread is assigned a unique rank. Threads across parallel regions from multiple nodes can express parallel algorithms explicitly using MPI’s message-passing APIs. In this talk, we will explore the usage of thread communicator, its performance, and its potential for MPI/OpenMP unification.