Breaking the Curse of Multilinguality in Language Models
Friday, April 5, 2024 11 AM to 12 PM
About this Event
Terra Blevins
PhD Candidate
Computer Science & Engineering
University of Washington
While language models (or LMs, à la ChatGPT) have become the predominant tool in natural language processing, their performance in non-English languages increasingly lags behind. This gap is due to the curse of multilinguality, which harms individual language performance in multilingual models through inter-language competition for model capacity. In this talk, I examine how current language models do and don't capture different languages and present new methods for fair modeling of all languages.
First, I demonstrate how LMs become multilingual through their data and training dynamics. Specifically, I show how data contamination teaches ostensibly English models cross-lingual information; I then characterize when multilingual models learn (and forget) languages during training to uncover how the curse of multilinguality develops. These analyses provide key insights into developing more equitable multilingual models, and I propose a new language modeling approach for Cross-Lingual Expert Language Models (X-ELM) that explicitly allocates model resources to reduce language competition.
Talk Location: Jubel 121