Hashing algorithms for Scalable and Sustainable Machine Learning on Commodity Hardware
Monday, February 27, 2023 11:30 AM
About this Event
Zhaozhuo Xu
PhD Candidate
Computer Science Department
Rice University
In recent years, we have witnessed the unprecedented achievements of machine learning (ML) models as we increase their model sizes. However, the model size increases much faster than hardware and network bandwidth upgrades, making it challenging to train and deploy ML models in the current system infrastructure. Moreover, with the global data privacy protection trends, it would be better to train ML models on user devices without data sharing. This device shift constrains the hardware resources for ML, exacerbating the tension between the effectiveness and efficiency of ML models.
In this talk, I will demonstrate the algorithmic progress in helping ML models overcome the hardware constraints with sparsity. Specifically, I start with my work in data sparsity and present a hash-based sampling algorithm for ML that adaptively selects data samples during training. Moreover, I will demonstrate how this hash-based sampling improves the machine teaching algorithm with 425.12x speedups and 99.76% energy savings on edge devices. Next, I will talk about our recent success in model sparsity. I will show my work in finding the provable efficient hashing algorithm that adaptively selects and updates a subset of parameters when training ML models. Later, I will present Zen, a system with hash algorithms that achieves near-optimal communication for sparse and distributed ML.
I will demonstrate the utility of the above hashing algorithms in applications, including recommendation systems, personalized education, census, and our collaboration with Shell in efficient seismic imaging for environmental protection.
Event Details
See Who Is Interested
Dial-In Information
Join Zoom Meeting
https://wustl.zoom.us/j/97556809432?pwd=RHdjRXBWVzhESm96YnlrMUNUbnVzQT09
About this Event
Zhaozhuo Xu
PhD Candidate
Computer Science Department
Rice University
In recent years, we have witnessed the unprecedented achievements of machine learning (ML) models as we increase their model sizes. However, the model size increases much faster than hardware and network bandwidth upgrades, making it challenging to train and deploy ML models in the current system infrastructure. Moreover, with the global data privacy protection trends, it would be better to train ML models on user devices without data sharing. This device shift constrains the hardware resources for ML, exacerbating the tension between the effectiveness and efficiency of ML models.
In this talk, I will demonstrate the algorithmic progress in helping ML models overcome the hardware constraints with sparsity. Specifically, I start with my work in data sparsity and present a hash-based sampling algorithm for ML that adaptively selects data samples during training. Moreover, I will demonstrate how this hash-based sampling improves the machine teaching algorithm with 425.12x speedups and 99.76% energy savings on edge devices. Next, I will talk about our recent success in model sparsity. I will show my work in finding the provable efficient hashing algorithm that adaptively selects and updates a subset of parameters when training ML models. Later, I will present Zen, a system with hash algorithms that achieves near-optimal communication for sparse and distributed ML.
I will demonstrate the utility of the above hashing algorithms in applications, including recommendation systems, personalized education, census, and our collaboration with Shell in efficient seismic imaging for environmental protection.