Argonne National Laboratory

Distributed I/O with ParaMEDIC: Experiences with a Worldwide Supercomputer

TitleDistributed I/O with ParaMEDIC: Experiences with a Worldwide Supercomputer
Publication TypeConference Paper
Year of Publication2008
AuthorsBalaji, P, Feng, W, Lin, H, Archuleta, J, Matsuoka, S, Warren, A, Setubal, J, Lusk, EL, Thakur, R, Foster, IT, Katz, DS, Jha, S, Shinpaugh, K, Coghlan, SM, Reed, D
Conference NameInternational Supercomputing Conference, Outstanding Paper Award
Date Published06/2008
Conference LocationDresden, Germany
Other NumbersANL/MCS-P1509-0608

Achieving high performance for distributed I/O on a wide-area network continues to be an elusive holy grail. Despite enhancements in network hardware as well as software stacks, achieving high-performance remains a challenge. In this paper, our worldwide team took a completely new and non-traditional approach to distributed I/O, called ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing, by utilizing application-specific transformation of data to orders-of-magnitude smaller meta-data before performing the actual I/O. Specifically, this paper details our experiences in deploying a large-scale system to facilitate the discovery of missing genes and constructing a genome similarity tree by encapsulating the mpiBLAST sequence-search algorithm into ParaMEDIC. The overall project involved nine different computational sites spread across the U.S. generating more than a petabyte of data, that was “teleported” to a large-scale facility in Tokyo for storage.