Computation Institute to bulk up data analysis capability with $1.5 million grantBy Angela Hardin • August 5, 2008
The Computation Institute, a joint effort of the University of Chicago and the Department of Energy's Argonne National Laboratory, has received a grant for a computer system that will enable researchers to store, access and analyze massive datasets.
The system is made possible by a $1.5 million grant from the National Science Foundation, which includes cost-sharing support from the University of Chicago. The new system is called the Petascale Active Data Store (PADS), which has been optimized for rapid data transactions, both on campus and around the globe.
Petascale computing involves the manipulation of petabytes of data. A petabyte is the equivalent of data contained on 1.5 million CD-ROMs.
The PADS design results from a study of the storage and analysis requirements of groups in astronomy and astrophysics, computer science, economics, evolutionary and organismal biology, geosciences, high-energy physics, linguistics, materials science, neuroscience, psychology and sociology.
For these groups, according to the PADS team, PADS represents a significant opportunity to look at their data in new ways, enabling new scientific insights and new collaborations across disciplines. PADS will also serve as a vehicle for computer science research into active data storage systems and will provide rich data with which to investigate new techniques.
Results will be made available as open source software, which can be freely downloaded and adapted for other purposes by interested users.
“PADS will bring a significant analysis resource to the University of Chicago campus and provide a testbed for research on high-performance analysis, a likely bottleneck in the scientific pipeline of the future,” said Michael Papka, Deputy Associate Laboratory Director for Computing, Environment, and Life Sciences at Argonne. Papka lead the interdisciplinary team of University of Chicago researchers who developed the PADS proposal.
Several nVidia Tesla graphics processing units (GPUs) will be integrated with traditional CPUs in the PADS system. These GPUs are capable of computing certain operations many times faster than general-purpose personal computers.
“The Tesla nodes will allow us to experiment with algorithms that combine traditional CPUs and special-purpose GPUs to extract results from data faster than in the past,” said Ian Foster, Director of the Computation Institute and the Arthur Holly Compton Distinguished Service Professor in Computer Science at the University of Chicago. “For example, in neuroscience, we will be using the system to accelerate Magnetic Resonance Imaging algorithms to diagnose traumatic brain injury.”
PADS will be a hybrid system with many layers of storage. These layers range from a large, tape-based system at Argonne to individual computers on campus and elsewhere. The intermediate layer is a rack of computer disks at Argonne containing duplicate data sets as insurance against hard-drive failure.
To University of Chicago scientists, PADS represents a dramatic improvement over current practice, which requires them to quickly analyze data and then remove it from the system to make room for new datasets. With the storage that PADS provides, groups will be able to keep data active for longer periods of analysis.
“PADS will allow us to share unique datasets with a larger community of researchers, enabling analysis of the data in different ways without the necessity to quickly remove the data because we need the space,” said Don Lamb, Director of the Center for Astrophysical Thermonuclear Flashes and the Louis Block Professor in Astronomy & Astrophysics at the University of Chicago.
The Computation Institute was founded in 2000 as a joint effort between Argonne and the university. Its mission is to address the most challenging problems arising in the use of strategic computation and communications.