A Massively Parallel Library for Matrix and Tensor Algorithms
Abstract: We present a framework of communication-avoiding parallel algorithms and a distributed-memory library of primitive routines for sparse and dense tensors. Computationally, a tensor represents a multidimensional array of data, while numerically tensors represent multilinear maps that operate and compose by means of contraction. We study the communication complexity of parallel algorithms for contractions of tensors with sparsity and symmetry. The proposed algorithms are implemented as part of Cyclops Tensor Framework. Cyclops supports contractions of tensors with user-defined element types and elementwise functions. We show performance results of application-codes achieving near 1 petaflop/s performance using Cyclops as well as case-studies using symmetry, sparsity, and custom element-types.