Effective data handling is critical to many scientific applications in the cloud. Of particular importance is the issue of storage. To investigate this issue, a team of researchers from Argonne’s Mathematics and Computer Science Division, the Argonne/University of Chicago Computation Institute, and the University of British Columbia benchmarked the performance of a number of storage systems using basic read and write patterns commonly observed in scientific applications. Their work, presented recently at the 5th Workshop on Scientific Cloud Computing, won the best paper award.
“We used the Swift parallel scripting framework to set up our experiments in the cloud,” said Ketan Maheshwari, a postdoctoral appointee in the MCS Division and lead author of the award-winning paper. “Depending on the application requirements, Swift proved able to handle both implicit and explicit data motions in the cloud.”
The team characterized both commercial storage systems and emerging research-based storage systems to test three real-world applications: a power flow simulation of the Illinois grid, a parallel Blast application for protein searches, and an energy analysis and thermal load simulation program. Different cloud configurations – local, single zone, and global – helped provide insights into the effectiveness of storage systems.
“Globally implemented clouds rely heavily on the internet backbone, resulting in a non-uniform and variable network characteristics. Our experiments show that storage solutions can mitigate these variabilities by techniques such as caching, replication, and prediction,” said Maheshwari.
The research team also found that applications with small to medium immediate storage requirements can be run effectively by aggregating the cloud node-local space by using storage solutions (see figure).
“These solutions almost always perform better than the dedicated object store and thus offer an exciting alternative to solutions provided by clouds such as Amazon’s S3,” said Maheshwari. But he noted that solutions such as S3 are still important for large-scale data handling and archiving.
The research was presented at ScienceCloud, collocated with the 23rd International ACM Symposium on High Performance Distributed Computing Workshop, held in Vancouver, Canada, on June 23, 2014. For further information, see the website at http://datasys.cs.iit.edu/events/ScienceCloud2014/program.html.
Ketan Maheshwari, Justin M. Wozniak, Hao Yang, Daniel S. Katz, Matei Ripeanu, Victor Zavala, and Michael Wilde, “Evaluating Storage Systems for Scientific Data in the Cloud,” in Proc. ScienceCloud, HPDC 2014.