
One solution that has become widespread is SZ, an open-source lossy data compressor for scientific data.
“SZ excels in compression ratio and throughput while preserving the scientific information contained in the data, with applications in simulations, instruments and artificial intelligence,” said Franck Cappello, a senior computer scientist in the Mathematics and Computer Science division at Argonne National Laboratory and lead developer of SZ. “But we are always looking for enhancements, and SZ’s modular design makes it easy for researchers to expand SZ’s capabilities, whether it is for accelerators or for more efficient I/O or for higher compression accuracy,” said Sheng Di, a computer scientist in the MCS division and core developer of SZ. “Scientific experiments in cutting-edge instruments require specialized streaming compression schemes capable of handling extreme data velocities from detector systems, which only hardware implementations can enable,” added Kazutomo Yoshii, a principal specialist in software engineering in the MCS division and chair of the Data Compression session.
Highlighted below is recent work that will be presented at the upcoming SC23 conference by the SZ compression team, led by Argonne National Laboratory and including collaborators from several universities. (Names in boldface are Argonne staff members.)
Dara Compression Session
Especially noteworthy is that the entire data compression session at SC23 will be related to SZ and its variants.
- cuSZp: An Ultra-Fast GPU Error-Bounded Lossy Compression Framework with Optimized End-to-End Performance – Yafan Huang, Sheng Di, Xiadong Yu, Guanpeng Li, Franck Cappello.
Experiments on NVIDIA A100 GPU with 6 representative scientific datasets demonstrate that cuSZp can achieve an ultra-fast end-to-end throughput (~100 GB/s) along with a high compression ratio and high reconstructed data quality. See Fig. 1.
- AMRIC: A Novel in Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications, Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, Sian Jin, Houjun Tang, Jean Sexton, Sheng Di, Kai Zhao, Bo Fang, Zarija Lukić, Franck Cappello, James Ahrens, Dingwen Tao.
This paper presents a novel compression approach showing how adaptive mesh refinement and error-bounded lossy compression can function together. See Fig. 2.
- ADT-FSE: A New Encoder for SZ. Tao Lu, Yu Zhong Zibin Sun, Xian Chen, You Zhou, Fei Wu, Ying Yang, Yunximn Huang, Yafei Yang.
Based on the Gaussian distribution of quantization factors, the authors design an adaptive data transcoding (ADT) scheme to map quantization factors to codes for better compressibility and then use finite state entropy (FSE) to compress the codes.
Data Workshop
Among the numerous workshops associated with SC23 is Data Analysis and Reduction for Big Scientific Data (DRBSD-9). Members of the SZ compression team will present three papers at this prestigious workshop:
- LibPressio-Predict: Flexible and Fast Infrastructure for Inferring Compression Performance. Robert R. Underwood, Sheng Di, Sian Jin, Md Hasanur Rahman, Arhan Khan, Franck Cappello.
In this work the authors present a scalable library for applications using predictions of compression performance.
- Lossy and Lossless Compression for BioFilm Optimal Coherence Tomography. Max Faykus III, Jon Calhoun, Melissa Smith.
This paper evaluates 5 lossless and 2 lossy state-of-the-art compressors as well as 2 preprocessing techniques to reduce optics coherence tomography data.
- Analyzing Impact of Data Reduction Techniques on Visualization for AMR Applications Using AMReX Framework. Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, James Ahrens, Dingwen Tao.
This paper examines how data compression influences and introduces novel challenges to the visualization of adaptive mesh refinement data
Also presented at DRBSD-9 will be another paper from the Argonne compression effort. Although this work is not directly related to SZ, the motivation is to bridge the gap between software and hardware experts through a generator framework for designing, verifying and estimating resources in streaming hardware compressor architectures.
- Streaming Hardware Compressor Generator Framework. Kazutomo Yoshii, Tomohiro Ueno, Kentaro Sano, Antonino Miceli, Franck Cappello.
The framework presented is designed to assist users in exploring different compressor architectures with different compressor building blocks, evaluating their characteristics and generating RTL code for integrating them into custom accelerator designs
Community Outreach
Tutorials are an excellent way to learn from the experts. SZ researchers will present a half-day tutorial reviewing the motivations, principles, techniques and error analysis methods for lossy compression of scientific datasets.
- Compression for Scientific Data, Franck Cappello, Peter Lindstrom, Sheng Di, Robert R. Underwood.
The tutorial will include a presentation of SZ team products – SZ, Z-checker, Libpressio, OptiZconfig, and SDRBench – as well as other state-of-the-art compressors
Compressing Data, Accelerating Science
“Numerous challenges remain in this exciting field,” Robert Underwood said. “The demand from science disciplines, applications and use-cases is so broad and the requirements in terms of compression ratio, speed and accuracy so high that only customized compressors can satisfy users’ needs. How can we provide a tool for these users to build their own customized lossy compressors, based on SZ? That’s a question that we are exploring in the new FZ compression framework project.”
One group, from China, has already built its own customized version of SZx, which the researchers call SWSZx. Their work, in which they simulated the Turkey earthquakes of 2023 and the Ridgecrest earthquake of 2019, will be presented in the Extreme-Scale Applications session of the conference.
- 69.7-PFlops Extreme Scale Earthquake Simulation with Crossing Multi-Faults and Topography on Sunway, Wubing Wan et al.
“These nine contributions at SC23 – eight papers and a tutorial – represent a remarkable showing of the potential of SZ in tackling issues raised by scientific data,” Cappello said. “Our hope is that SC attendees will benefit from the experiences as presented in these technical papers, tutorials, and workshops.”