Skip to main content
Seminar | Mathematics and Computer Science Division

Numerical Behavior of GPU Matrix Multiply-Accumulate Hardware

MCS Seminar

Abstract: Tensor cores and matrix engines are hardware units on the latest GPUs that perform dot product or matrix multiply accumulate (MMA) operations. 127 of the TOP 500 supercomputers contain these units and a lot of the numerical libraries begin to utilize them in various algorithms in scientific computing. Tensor cores and similar arithmetic units are targeted at low precision machine learning algorithms and therefore are not necessarily compliant with the IEEE 754 standard. The features such as rounding, normalization, order of operations, subnormal number support and others can differ from a standard software implementation of the matrix multiplication.

In this talk I will discuss our recent work on determining various numerical features of MMAs, using NVIDIA tensor cores as an example test case. We determined the features of the three generations of the tensor core with the carefully constructed numerical test cases on the V100, T4 and the A100 NVIDIA GPUs and have explored the effects those features have on applications.