Data Intensive Software and Tools for Climate Science
"Big data" —a colloquial term describing the enormous data sets resulting from ever-expanding computer capabilities and observational efforts—poses challenges to researchers and computer scientists alike, from storing and managing the volumes to accessing and analyzing the information. In climate research these challenges are beginning to be met by the Earth System Grid Federation (ESGF), which now manages more than 1.8 petabytes (PB, 1 petabyte =1015 bytes) of data stored in a federated archive.
The data currently handled using ESGF software mainly has been produced by climate models following protocols established by the internationally coordinated Coupled Model Intercomparison Project (CMIP). Over the last 20 years, CMIP has enabled research that has infused the periodic assessments of climate science carried out by the Intergovernmental Panel on Climate Change (IPCC). Without an advanced software infrastructure to provide support for the huge, distributed archive of CMIP climate model output, its scientific impact would surely have been reduced.
Today, ESGF comprises a system of geographically distributed peer nodes that host climate data at sites around the world. These nodes are independently administered but are united by common protocols and interfaces. With this peer-to-peer infrastructure, scientists, resource managers, policymakers, and a host of other users can all obtain through a common interface climate data distributed on servers worldwide.
To facilitate analysis of “big data”, two advanced analysis strategies are also being developed:
- i) quick-look visualizations that support model intercomparison directly in the users’ Web browser; and
- ii) powerful climate analysis products that are made available at the desktop using a newly available software framework, the Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT).
The UV-CDAT framework has been designed to enable high-end simulation, parallel/distributed data analysis, and advanced displays of complex data sets—building on ESGF data holdings and data distribution capabilities.