CALCioM: a Holistic, Machine-Wide Approach to I/O Management
Unmatched computation and storage performance in new HPC systems have led to a plethora of I/O optimizations ranging from application-side collective I/O to network and disk-level request scheduling on the file system side. As we deal with ever larger machines, the interferences produced by multiple applications accessing a shared parallel file system in a concurrent manner becomes a major problem. These interferences often break single-application I/O optimizations, which dramatically degrades their I/O performance and, as a result, machine-wide efficiency.
In this talk, we propose to overcome the impact of I/O interferences through the CALCioM approach. This approach allows several applications running on a supercomputer to communicate and coordinate their I/O strategy together in order to avoid interfering with each other. Using synthetic benchmarks on ANL's BG/P Surveyor machine and on several clusters of the French Grid'5000, we show how CALCioM can be used to efficiently and transparently improve the scheduling strategy between two interfering applications, given a specified metrics of machine-wide efficiency.
Matthieu Dorier is a third year PhD Student from the Ecole Normale Supérieure de Rennes (France), working in the KerData team at INRIA Rennes under the supervision of Gabriel Antoniu and Luc Bougé. His interests are High Performance Computing, storage systems, I/O and large scale visualization.
Matthieu is an active participant of the INRIA/UIUC/ANL Joint Lab for Petascale Computing, and is completing an internship under the supervision of Rob Ross in the context of the INRIA/ANL Data@Exascale associate team.