I/O performance is gaining increasing importance in large-scale high-performance computing (HPC) systems. Evaluating and enhancing the I/O performance of applications is challenging, however, because of wide variations in data volume, I/O strategy, and access method.
To address this challenge, researchers at Argonne National Laboratory have developed Darshan, a scalable I/O characterization tool that collects I/O access pattern information from HPC production applications. Initially run at the IBM Blue Gene systems at the Argonne Leadership Computing Facility, Darshan recently was adapted by Argonne researchers, together with colleagues from Lawrence Berkeley National Laboratory, for deployment on the Cray XE6 “Hopper” system at the National Energy Research Scientific Computing Center. The experiences of the team in adapting Darshan to this environment were presented at the CUG 2013 meeting May 6–9 in Napa, Calif.
Darshan was designed for portability, but the team faced several challenges in adapting it to the 1.2-petaflops Cray programming environment. The researchers integrated Darshan into the Cray compilation tools to transparently support five different compilers. They modified Darshan to avoid system call overhead on newly opened files. They also tuned MPI-IO, the mechanism Darshan uses to write logs to disk, adding “hints” to be specified for collective I/O step either at compile time or at run time.
The team then used Darshan logs in a case study to show how the data can help identify underperforming applications. Results from three metrics—I/O traffic, metadata overhead, and shared-file write patterns—clearly indicated ways in which certain applications could benefit from additional I/O tuning.
For a full account of this work, see P. Carns, Y. Yao, K. Harms, R. Latham, R. Ross, and K. Antypas, “Production I/O Characterization on the Cray XE6,” May 2013.