Skip to main content
Research Highlight | Mathematics and Computer Science

Argonne data compression research highlighted at SC23

The ever-increasing volume of scientific data necessitates new approaches to compress the data enough to fit simulation and experiment user constraints in terms of storage space, I/O speed, memory size.

One solution that has become widespread is SZ, an open-source lossy data compressor for scientific data.

SZ excels in compression ratio and throughput while preserving the scientific information contained in the data, with applications in simulations, instruments and artificial intelligence,” said Franck Cappello, a senior computer scientist in the Mathematics and Computer Science division at Argonne National Laboratory and lead developer of SZ. But we are always looking for enhancements, and SZ’s modular design makes it easy for researchers to expand SZ’s capabilities, whether it is for accelerators or for more efficient I/O or for higher compression accuracy,” said Sheng Di, a computer scientist in the MCS division and core developer of SZ. Scientific experiments in cutting-edge instruments require specialized streaming compression schemes capable of handling extreme data velocities from detector systems, which only hardware implementations can enable,” added Kazutomo Yoshii, a principal specialist in software engineering in the MCS division and chair of the Data Compression session.

Highlighted below is recent work that will be presented at the upcoming SC23 conference by the SZ compression team, led by Argonne National Laboratory and including collaborators from several universities. (Names in boldface are Argonne staff members.)

Dara Compression Session

Especially noteworthy is that the entire data compression session at SC23 will be related to SZ and its variants.

Experiments on NVIDIA A100 GPU with 6 representative scientific datasets demonstrate that cuSZp can achieve an ultra-fast end-to-end throughput (~100 GB/s) along with a high compression ratio and high reconstructed data quality. See Fig. 1.

This paper presents a novel compression approach showing how adaptive mesh refinement and error-bounded lossy compression can function together. See Fig. 2.

Fig. 2: AMRIC high-compression framework. Left: original data; right: AMRIC SZ_L/R, compression ratio 53.2.

Based on the Gaussian distribution of quantization factors, the authors design an adaptive data transcoding (ADT) scheme to map quantization factors to codes for better compressibility and then use finite state entropy (FSE) to compress the codes.

Data Workshop

Among the numerous workshops associated with SC23 is Data Analysis and Reduction for Big Scientific Data (DRBSD-9). Members of the SZ compression team will present three papers at this prestigious workshop:

In this work the authors present a scalable library for applications using predictions of compression performance.

This paper evaluates 5 lossless and 2 lossy state-of-the-art compressors as well as 2 preprocessing techniques to reduce optics coherence tomography data. 

This paper examines how data compression influences and introduces novel challenges to the visualization of adaptive mesh refinement data

Also presented at DRBSD-9 will be another paper from the Argonne compression effort. Although this work is not directly related to SZ, the motivation is to bridge the gap between software and hardware experts through a generator framework for designing, verifying and estimating resources in streaming hardware compressor architectures.

The framework presented is designed to assist users in exploring different compressor architectures with different compressor building blocks, evaluating their characteristics and generating RTL code for integrating them into custom accelerator designs

Community Outreach

Tutorials are an excellent way to learn from the experts. SZ researchers will present a half-day tutorial reviewing the motivations, principles, techniques and error analysis methods for lossy compression of scientific datasets.

The tutorial will include a presentation of SZ team products – SZ, Z-checker, Libpressio, OptiZconfig, and SDRBench – as well as other state-of-the-art compressors

Compressing Data, Accelerating Science

Numerous challenges remain in this exciting field,” Robert Underwood said. The demand from science disciplines, applications and use-cases is so broad and the requirements in terms of compression ratio, speed and accuracy so high that only customized compressors can satisfy users’ needs. How can we provide a tool for these users to build their own customized lossy compressors, based on SZ? That’s a question that we are exploring in the new FZ compression framework project.”

One group, from China, has already built its own customized version of SZx, which the researchers call SWSZx. Their work, in which they simulated the Turkey earthquakes of 2023 and the Ridgecrest earthquake of 2019, will be presented in the Extreme-Scale Applications session of the conference.

These nine contributions at SC23 – eight papers and a tutorial – represent a remarkable showing of the potential of SZ in tackling issues raised by scientific data,” Cappello said. Our hope is that SC attendees will benefit from the experiences as presented in these technical papers, tutorials, and workshops.”