Sign Up

ESE Seminar: Hongyi Wang

This is a past event.

Tuesday, March 26, 2024 11 AM to 12 PM

135 N Skinker Blvd, St. Louis, MO 63112, USA

#Seminar

A Step Further Toward Scalable and Automatic Distributed Large Language Model Pre-training

Abstract: Large Language Models (LLMs) are at the forefront of advances in the field of AI. Nonetheless, training these LLMs is computationally daunting, which necessitates distributed training. Distributed training generally suffers from bottlenecks, including heavy communication costs and the need for extensive performance tuning. Distributed training with hybrid parallelism, which combines data and model parallelism, is essential for LLM pre-training. Designing effective hybrid parallelism strategies requires a substantial tuning effort and specialized expertise. In this talk, I will first discuss how to automatically and efficiently design high-throughput hybrid-parallelism strategies using system cost models. Then, I will demonstrate the use of these automatically designed hybrid parallelism strategies to train state-of-the-art LLMs from scratch. Finally, I will introduce a low-rank training framework to enhance communication efficiency in data parallelism. This proposed framework achieves near-ideal scalability without sacrificing model quality by leveraging a full-rank to low-rank training strategy and a layer-wise adaptive rank selection mechanism.

0 people are interested in this event

135 N Skinker Blvd, St. Louis, MO 63112, USA

#Seminar

A Step Further Toward Scalable and Automatic Distributed Large Language Model Pre-training

Abstract: Large Language Models (LLMs) are at the forefront of advances in the field of AI. Nonetheless, training these LLMs is computationally daunting, which necessitates distributed training. Distributed training generally suffers from bottlenecks, including heavy communication costs and the need for extensive performance tuning. Distributed training with hybrid parallelism, which combines data and model parallelism, is essential for LLM pre-training. Designing effective hybrid parallelism strategies requires a substantial tuning effort and specialized expertise. In this talk, I will first discuss how to automatically and efficiently design high-throughput hybrid-parallelism strategies using system cost models. Then, I will demonstrate the use of these automatically designed hybrid parallelism strategies to train state-of-the-art LLMs from scratch. Finally, I will introduce a low-rank training framework to enhance communication efficiency in data parallelism. This proposed framework achieves near-ideal scalability without sacrificing model quality by leveraging a full-rank to low-rank training strategy and a layer-wise adaptive rank selection mechanism.