Advances in detector technologies and experimental techniques have enabled the generation of three-dimensional data at rates of terabytes per day. The usefulness of this data is severely limited, however, by the hours required to analyze the resulting large datasets. Yet quasi-instant feedback is critical in enabling scientists to check results, identify optimal experimental parameters, and accelerate the end-to-end scientific workflow.
To address this problem, researchers at Argonne National Laboratory, The Ohio State University, and The University of Chicago have developed new parallel techniques for reconstructing tomographic images rapidly and at large scale.
Typically, tomographic reconstruction algorithms proceed iteratively, with rays simulated according to reconstructed data from the previous iteration. Since rays in rows corresponding to different projections do not intersect, the reconstruction of individual rows – also referred to as slices – can proceed in parallel. While per-slice parallel reconstruction algorithms have been proposed, however, they usually are not suitable for large, high-resolution datasets such as those from the Advanced Photon Source (APS) at Argonne.
“The reason is that the per-slice technique can use only as much parallelism as there are slices in a dataset,” explained Tekin Bicer, a postdoctoral appointee at Argonne. “For example, a dataset with 2,048 slices cannot be reconstructed with more than 2,048 parallel units and hence can take days to finish.”
The solution devised by the research team involves a novel in-slice technique that performs parallel reconstruction for each ray. By replicating the output slices for each thread, the technique provides significantly finer-grained parallelism.
The next step was to improve programmability. For this purpose, the researchers implemented both the per-slice and in-slice techniques in a MapReduce-like middleware, which was further optimized to make it easy to implement different reconstruction operations.
The middleware was then evaluated with four reconstruction algorithms and two real-world datasets from different APS beamlines. Experimental results on the IBM Blue Gene/Q “Mira” at Argonne show close to perfect speedups on up to 8,192 cores and reductions in execution times by more than 95% on 32,768 cores compared with 1,024-core configurations. Moreover, the average acceleration times are dramatically improved – from approximately 2 hours for 256 cores to approximately 1 minute for 32,768 cores.
“To the best of our knowledge, this is the first study that examines the parallelization of tomographic reconstruction algorithms at this scale. And the results offer the promise of enabling near-real-time reconstruction of the large datasets generated x-ray light sources such as the APS,“ said Bicer.
The results were presented at Euro-Par 2015 and were published in Lecture Notes in Computer Science, vol. 9233, pp. 289-302, 2015.