Skip to main content

Data-Intensive Science Software

Computational scientists at Argonne’s Mathematics and Computer Science Division are developing innovative software that will extract knowledge and insights or help manipulate large data sets in order to make new scientific discoveries.
Software for big data includes:
  • CODES is a detailed, accurate, and highly parallel simulation toolkit for exascale storage.
  • Darshan is a scalable HPC I/O characterization tool, designed to capture an accurate picture of application I/O behavior, including properties such as patterns of access within files, with minimum overhead.
  • Globus Online is a file transfer and synchronization service that is specifically geared to the big data needs of the research community. In particular, it provides reliable, high-performance, secure file transfer.
  • Globus Connect is the de facto standard for projects requiring secure, robust, high-speed bulk data transport.
  • IOFSL is an I/O forwarding scalability layer providing function shipping at the file system interface level.
  • Mercury is an interface enabling remote procedure calls in high-performance computing
  • Parallel netCDF provides high-performance I/O while still maintaining file format compatibility with Unidata’s NetDCF.
  • PVFS (Parallel Virtual File System) brings state-of-the-art parallel I/O concepts to production parallel systems. PVFS is designed to scale to petabytes of storage and provide access rates at hundreds of gigabytes per second. It also continues to be used as a platform for active research in the parallel I/O field.
  • ROMIO is a high-performance, portable implementation of MPI-IO. ROMIO includes almost everything defined in the MPI-2 I/O chapter and is optimized for noncontiguous access patterns, which are common in parallel applications. It also has an optimized implementation of collective I/O, an important optimization in parallel I/O.
  • Swift is a parallel scripting language designed for composing application programs into parallel applications that can be executed on multicore processors, clusters, grids, clouds, and supercomputers.
  • TAO (Toolkit for Advanced Optimization) focuses on the design and implementation of component-based optimization software for the solution of large-scale optimization applications.
  • Triton is a novel scalable, self-repairing highly available distributed object
    storage system, designed to enable concurrent access to petabytes of data.