Practical limits of event-based algorithms for vectorizing Monte Carlo codes

December 8, 2017

Advanced computer systems such as the Intel Xeon Phi and the NVIDIA Tesla GPU offer high levels of data-level parallelism. Unfortunately, this form of parallelism is not handled well by Monte Carlo neutron transport simulations.

Image of a reactor prismatic assembly geometry.

One possible solution is to restructure the Monte Carlo code to use an event-based algorithm, where an event may be an action such as a collision with a target nuclide. Code restructuring, however, is nontrivial: the data structures and loop organization must be modified, and architecture-specific optimization is often necessary in order to achieve optimal performance.

Rather than implementing a specific event-based algorithm, therefore, researchers in Argonne’s Mathematics and Computer Science (MCS) Division have undertaken a theoretical study to quantify practical limits on the efficiency of event-based algorithms.

For their study the researchers created two models: one in which they assumed that all events of a given type execute in the same amount of time, and the other in which they allowed each event type to have a distribution of event times. Data collected from simulations of four reactor problems using the Monte Carlo code OpenMC was then used in conjunction with the two models to calculate the vector speedup as a function of the particle bank and the vector width.

The results show that when all events are assumed to have a constant execution time (model 1), the overall vector efficiency is solely a function of the particle bank size. The researchers concluded that with a bank size at least 20 times greater than the vector size, a vector efficiency greater than 90% is possible. But when the execution times are allowed to vary (model 2), the average speedup at each event iteration depends not only on the bank size but also on the differences in event times. Thus, an infinitely large particle bank would result in a vector speedup of no more than about 4. For some problems, such as those characterized by a high number of collisions per particle, vector efficiencies over 50% may not be attainable.

“We view this work as a first step toward understanding the performance of an event-based algorithm,” said Paul Romano, a computational scientist in the MCS Division. “Although our assumptions may not hold for a realistic simulation, they enabled us to identify how efficiency might change if an event is not perfectly vectorizable and to establish an upper bound on the efficiency of the algorithm.”

Arguably, exploiting vectorization is not the only way to achieve performance on advanced computers. Other optimizations may be crucial, especially for codes such as Monte Carlo neutron transport that are characterized by limited memory latency and bandwidth.

“In practice, the use of event-based algorithms – with their better use of cache – could improve performance,” Romano said. But he acknowledged that “capturing such an effect in a theoretical model is notoriously difficult.”

For further information see the paper by Paul K. Romano and Andrew R. Siegel, “Limits on the efficiency of event-based algorithms for Monte Carlo neutron transport,” Nuclear Engineering and Technology 49(6): 1165-1171, Sept. 2017