Characterizing Lock Contention in a Thread Compliant MPI Implementation
Hybrid MPI-Threading programming has emerged as an alternative model to the MPI everywhere to handle the increasing core density in cluster nodes. To support such model, a MPI implementation must be thread-safe while ensuring that blocking MPI calls only block the calling thread. However, little attention has been given to the performance implications of having a thread compliant MPI implementation.
In this work, we perform an analysis of the contention that arises in MPI critical sections. To ensure mutual exclusion, lock-free algorithms based on atomic polling were implemented. Moreover, this method let us use "wasted polls" as a low overhead profiling metric to assess the degree of contention. This MPI implementation was used with a series of benchmarks to characterize lock contention. We found that even in one-sided communications with active targets, characterized by a simple critical section, contention was observed with noticeable performance losses.
Halim Amer is a PhD student from the Department of Mathematical and Computing Sciences at Tokyo Institute of Technology, Tokyo, Japan, under the supervision of Prof. Satoshi Matsuoka. His research focuses on threading models and runtimes in High Performance Computing (HPC). This includes their application in intra-node level parallelism and when coupled with a communication runtime such as MPI. Since September, he has been working with Dr. Pavan Balaji on characterizing lock contention in thread compliant MPI implementations.