Seminar #1: T-FSM: A Task-Based System for Massively Parallel Frequent Subgraph Pattern Mining from a Big Graph
Speaker: Lyuheng Yuan - Ph.D. student of Computer Science at the University of Alabama at Birmingham, and a Visiting Student in the MCS division at ANL for Summer 2023
Abstract: Finding frequent subgraph patterns in a big graph is an important problem with many applications such as classifying chemical compounds and building indexes to speed up graph queries. Since this problem is NP-hard, some recent parallel systems have been developed to accelerate the mining. However, they often have a huge memory cost, very long running time, suboptimal load balancing, and possibly inaccurate results.
To address these problems, I will describe an efficient system called T-FSM for parallel mining of frequent subgraph patterns in a big graph. T-FSM adopts a novel task-based execution engine design to ensure high parallelism, bounded memory consumption, and effective load balancing. It also supports a new anti-monotonic frequentness measure called Fraction-Score, which is more accurate than the widely used MNI measure. Experiments show that T-FSM is orders of magnitude faster than SOTA systems for frequent subgraph pattern mining.
Seminar #2: A GPU-accelerated Data Transformation Framework Rooted in Pushdown Transducers
Speaker: Tri Nguyen - PhD student of Computer Architecture at North Carolina State University, and a Visiting Student in the MCS division at ANL for Summer 2023
Abstract: With the rise of machine learning and data analytics, the ability to process large and diverse sets of data efficiently has become crucial. Research has shown that data transformation is a key performance bottleneck for applications across a variety of domains, from data analytics to scientific computing. Custom hardware accelerators and GPU implementations targeting specific data transformation tasks can alleviate the problem but suffer from narrow applicability and lack of generality.
To tackle this problem, we propose a GPU-accelerated data transformation engine grounded on pushdown transducers. We define an extended pushdown transducer abstraction (effPDT) that allows expressing a wide range of data transformations in a memory-efficient fashion and is thus amenable for GPU deployment. The effPDT execution engine utilizes a data streaming model that reduces the application’s memory requirements significantly, facilitating deployment on high- and low-end systems. We showcase our GPU-accelerated engine on a diverse set of transformation tasks covering data encoding/decoding, parsing and querying of structured data, and matrix transformation, and we evaluate it against publicly available CPU and GPU library implementations of the considered data transformation tasks. To understand the benefits of the effPDT abstraction, we extend our data transformation engine to also support finite state transducers (FSTs), we map the considered data transformation tasks on FSTs, and we compare the performance and resource requirements of the FST-based and the effPDT-based implementations.