Abstract: Distributed services play a key role in HPC architectures by decoupling functionality from application codes. They not only preserve and share state across application executions, but also encourage resource and software disaggregation so that applications can focus on the computational tasks at hand. Parallel file systems are the most prominent and successful examples of distributed HPC services. They face extraordinary demands in terms of scale, performance, availability, and generality, however. Those demands have led them towards increasingly sophisticated, but monolithic, one-size-fits-all designs, backed by decades of engineering, that cannot be readily customized to the diverse needs of today’s HPC applications.
The MCS Mochi project is rethinking this state of the practice in HPC data services by making it possible to rapidly construct specialized, composable data services to augment existing platform capabilities. This talk will explore why this approach is crucial to unlocking the potential of data-intensive scientific computing. It will also explore the technical hurdles that must be overcome to balance agility, robustness, and performance without compromising the production characteristics that users have come to expect from conventional data services.
This seminar is the first of a two-part series. Next week’s CS Seminar, presented by Matthieu Dorier, will take a deeper dive into Mochi by focusing on some successful services built over the years, including a storage service tailored to high energy physics applications, and a data staging service enabling elastic in-transit analysis.