Abstract: Applications that use machine learning (ML) and deep learning (DL) are increasingly being run on heterogeneous architectures, such as GPUs, TPUs, and FPGAs. There is also an increasing number of hardware chips that are designed specially to cater to ML and DL workloads as these systems provide massive hardware potential to boost the execution of model training. Characterizing a DL workload based on its compute, memory, and I/O behavior and gaining detailed insights is crucial yet challenging.
In this talk, I will present an overview of few tools that help to understand how deep learning applications run on Nvidia GPUs. The insights gained from the tools could be used to further optimize these modeled.