Distributed I/O with ParaMEDIC: Experiences with a Worldwide Supercomputer
|Title||Distributed I/O with ParaMEDIC: Experiences with a Worldwide Supercomputer|
|Publication Type||Conference Paper|
|Year of Publication||2008|
|Authors||Balaji, P, Feng, W, Lin, H, Archuleta, J, Matsuoka, S, Warren, A, Setubal, J, Lusk, EL, Thakur, R, Foster, IT, Katz, DS, Jha, S, Shinpaugh, K, Coghlan, SM, Reed, D|
|Conference Name||International Supercomputing Conference, Outstanding Paper Award|
|Conference Location||Dresden, Germany|
Achieving high performance for distributed I/O on a wide-area network continues to be an elusive holy grail. Despite enhancements in network hardware as well as software stacks, achieving high-performance remains a challenge. In this paper, our worldwide team took a completely new and non-traditional approach to distributed I/O, called ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing, by utilizing application-specific transformation of data to orders-of-magnitude smaller meta-data before performing the actual I/O. Specifically, this paper details our experiences in deploying a large-scale system to facilitate the discovery of missing genes and constructing a genome similarity tree by encapsulating the mpiBLAST sequence-search algorithm into ParaMEDIC. The overall project involved nine different computational sites spread across the U.S. generating more than a petabyte of data, that was “teleported” to a large-scale facility in Tokyo for storage.