Skip to main content
Seminar | Mathematics and Computer Science Division

Scalable Data Mining via Constrained Low Rank Approximation

Abstract: Matrix and tensor approximation methods are recognized as foundational tools of modern data analytics. Their strength lies in their long history of rigorous and principled theoretical foundations, judicious formulations via various constraints, along with the availability of fast computer programs. Constrained Low Rank Approximation (CLRA) algorithms have been popular for a variety of tasks like clustering, outlier detection, dimensionality reduction amongst others. The recent push towards more explainable models, especially in the unsupervised setting, have led to a renewed interest in these linear models.

In this talk, we introduce a new method for simultaneously clustering nodes and detecting anomalous subgraphs in attributed networks. This method will serve as a case study for the different computational bottlenecks that arise in many CLRA algorithms. We describe how to handle these bottlenecks in the distributed-memory setting via our software package PLANC. Finally, we describe ways to extend PLANC to develop a robust and scalable data analytics pipeline.