Skip to main content
Seminar | Mathematics and Computer Science

Generalizing the Scaling Laws for Dense and Sparse Large Language Models

CS Seminar

Abstract: In recent years, the size of large language models (LLMs) have increased exponentially, and the LLM pretraining is very time-consuming. Understanding the scaling behavior of LLMs is critical for training efficiency and responsible resource allocation. Existing scaling laws have demonstrated that proportionally increasing model capacity along with data size and compute budget decreases training loss and improves model performance.

In this work, we revisit the existing empirical scaling laws for dense and sparse LLMs and aim to generalize these scaling laws for different architectures using one single convenient representation for both dense and sparse LLMs.

Bio: Md Arafat Hossain is a Ph.D. student in computer science at Iowa State University. He is currently serving as a research aide-technical at Argonne National Lab, hosted by Dr. Xingfu Wu.

See upcoming and previous presentations at CS Seminar Series.