Seminar #1: Automating Flood Extent Mapping on Earth Imagery Using Elevation-Guided AI Technology
Speaker: Saugat Adhikari, 2nd year Ph.D. student of Computer Science at the University of Alabama at Birmingham, and a Visiting Student in the MCS division at Argonne for Summer 2023
Abstract: High-resolution optical imagery becomes increasingly available with the wide deployment of satellites and drones, and accurate and timely mapping of flood extent from the imagery plays a crucial role in disaster management such as damage assessment and relief activities. However, the problem is non-trivial due to the lack of ground truth flood maps, significant imagery noise and obstacles, complex spatial dependency on 3D terrains, spatial non-stationarity, and high computational cost. Existing machine learning approaches are mostly terrain-unaware and are prone to produce spurious results due to imagery noise and obstacles, requiring significant efforts in post-processing. To overcome this problem, we have developed an elevation-guided framework for accurate ground truth annotation, flood map prediction, and flood map refinement.
In this talk I will describe the 3 steps in our AI pipeline (1) Efficient annotation of disaster-time satellite images using 3D elevation-guided technology; (2) Elevation-guided flood map segmentation, with the model trained using the annotated images; (3) Elevation-guided flood map refinement using a graphical model called hidden Markov tree. Finally, I will talk about our ongoing works regarding Active Learning to speed up the annotation process and parallelization of our post-processing algorithm to speed up the computation.
Seminar #2: Improving Data Loading and Communication Performance for Large-Scale Distributed Training
Speaker: Baixi Sun, Ph.D. student at Indiana University Luddy School of Informatics, Computing, and Engineering, and Research Aide in the MCS Division at Argonne for Summer 2023
Abstract: Large-scale distributed training of Deep Neural Network (DNN) models reveals performance issues on High-Performance Computing (HPC) clusters. On the one hand, the effectiveness of DNN models heavily depends on large training datasets (e.g., Terabyte-scale), making data loading challenging in today’s distributed training. On the other hand, second-order optimizers offer improved convergence and generalization in DNN training but come with extra data communication overhead compared to stochastic gradient descent (SGD) optimizers. Therefore, reducing communication costs is crucial for the performance of second-order optimizers.
To address these problems, I will discuss two system-level optimizations: SOLAR and SSO. SOLAR utilizes offline and online scheduling strategies to optimize the data loading cost from parallel filesystems to device memory (e.g., Graphics Processing Unit - GPU). SSO avoids latency-bounded communication and integrates lossy compression algorithms to reduce communication message size while preserving the benefits of second-order optimizers, such as faster convergence compared to SGD-based optimizers. Specifically, I will describe the challenges of data loading and communication in large-scale distributed training, share our insights on performance improvements, and explain how SOLAR and SSO address these challenges and issues.