Multilevel Toeplitz linear systems appear in a wide range of scientific and engineering applications. While several fast direct solvers exist for the basic 1-level Toeplitz matrices, in the multilevel case an iterative solver provides the most general and practical solution. Furthermore, iterative methods are asymptotically faster than many stable direct methods even for the 1-level case. This paper proposes several parallelization techniques that enable an efficient implementation of the conjugate gradient algorithm for solving multilevel Toeplitz systems on distributed-memory machines. The two major differences between this implementation and that for a general sparse linear solver are (1) a communication-efficient approach to handle data expansion and truncation and data transpose simultaneously; (2) the interleaving of matrix-vector multiplications and vector inner product calculations to reduce synchronization cost and latency. Similar ideas can be applied to the implementation of other iterative methods for Toeplitz systems that are not necessarily symmetric positive definite. Scaling results are shown to demonstrate the usefulness of the proposed techniques.

%B Int. Conf. on Computational Science (ICCS 2013) %C Barcelona, Spain %8 09/2012 %G eng %1 http://www.mcs.anl.gov/papers/P3038-0912.pdf