ESE Seminar: Hongyi Wang
Tuesday, March 26, 2024 11 AM to 12 PM
About this Event
135 N Skinker Blvd, St. Louis, MO 63112, USA
#SeminarA Step Further Toward Scalable and Automatic Distributed Large Language Model Pre-training
Abstract: Large Language Models (LLMs) are at the forefront of advances in the field of AI. Nonetheless, training these LLMs is computationally daunting, which necessitates distributed training. Distributed training generally suffers from bottlenecks, including heavy communication costs and the need for extensive performance tuning. Distributed training with hybrid parallelism, which combines data and model parallelism, is essential for LLM pre-training. Designing effective hybrid parallelism strategies requires a substantial tuning effort and specialized expertise. In this talk, I will first discuss how to automatically and efficiently design high-throughput hybrid-parallelism strategies using system cost models. Then, I will demonstrate the use of these automatically designed hybrid parallelism strategies to train state-of-the-art LLMs from scratch. Finally, I will introduce a low-rank training framework to enhance communication efficiency in data parallelism. This proposed framework achieves near-ideal scalability without sacrificing model quality by leveraging a full-rank to low-rank training strategy and a layer-wise adaptive rank selection mechanism.
Event Details
See Who Is Interested
0 people are interested in this event
About this Event
135 N Skinker Blvd, St. Louis, MO 63112, USA
#SeminarA Step Further Toward Scalable and Automatic Distributed Large Language Model Pre-training
Abstract: Large Language Models (LLMs) are at the forefront of advances in the field of AI. Nonetheless, training these LLMs is computationally daunting, which necessitates distributed training. Distributed training generally suffers from bottlenecks, including heavy communication costs and the need for extensive performance tuning. Distributed training with hybrid parallelism, which combines data and model parallelism, is essential for LLM pre-training. Designing effective hybrid parallelism strategies requires a substantial tuning effort and specialized expertise. In this talk, I will first discuss how to automatically and efficiently design high-throughput hybrid-parallelism strategies using system cost models. Then, I will demonstrate the use of these automatically designed hybrid parallelism strategies to train state-of-the-art LLMs from scratch. Finally, I will introduce a low-rank training framework to enhance communication efficiency in data parallelism. This proposed framework achieves near-ideal scalability without sacrificing model quality by leveraging a full-rank to low-rank training strategy and a layer-wise adaptive rank selection mechanism.