About this Event
Today’s personalized recommendation systems leverage deep learning to deliver the best user experience in internet services used by search engines, social networks, online retail, and content streaming. Given the volume of personalized inferences and their rapid growth rate occurring in datacenters, we propose RecNMP -- a lightweight, commodity DRAM compliant, near-memory processing (NMP) solution, to accelerate the sparse embedding operations in recommendation models. Furthermore, to demonstrate the performance potential of NMP technology in real hardware, we develop the FPGA-enabled NMP platform called AxDIMM providing rapid prototyping under a realistic system setting using industry-representative recommendation framework. However, the untrusted near-data processing (NDP) units bring in new threats to workloads that are private and sensitive, such as private database queries and private machine learning inferences. Current encryption schemes do not support computation over encrypted data stored in memory or storage, hindering the adoption of NDP techniques for sensitive workloads. We propose SecNDP, a lightweight encryption and verification scheme for untrusted NDP devices to perform computation over ciphertext and verify the correctness of linear operations. Due to the fast-evolving and rigorous growth of production-grade recommendation serving nowadays, optimizing their cluster-level serving performance and efficiency in a heterogeneous datacenter is challenging from the following three perspectives, model diversity, cloud-scale system heterogeneity, and time-varying load patterns. We propose Hercules – a comprehensive optimization framework tailor-designed for at-scale neural recommendation inferences performing a two-stage optimization procedure – offline profiling and online serving.
As for the future direction of my dissertation, we will focus on tackling the system challenges for the future large-scale recommendation with disaggregated systems. The recommendation models are constantly evolving and tending to be heavily large to several terabytes in industry production. Disaggregation has recently emerged and could be a potential solution to tackle this fast algorithmic evolution and memory capacity challenges.
Event Details
See Who Is Interested
0 people are interested in this event
User Activity
No recent activity
About this Event
Today’s personalized recommendation systems leverage deep learning to deliver the best user experience in internet services used by search engines, social networks, online retail, and content streaming. Given the volume of personalized inferences and their rapid growth rate occurring in datacenters, we propose RecNMP -- a lightweight, commodity DRAM compliant, near-memory processing (NMP) solution, to accelerate the sparse embedding operations in recommendation models. Furthermore, to demonstrate the performance potential of NMP technology in real hardware, we develop the FPGA-enabled NMP platform called AxDIMM providing rapid prototyping under a realistic system setting using industry-representative recommendation framework. However, the untrusted near-data processing (NDP) units bring in new threats to workloads that are private and sensitive, such as private database queries and private machine learning inferences. Current encryption schemes do not support computation over encrypted data stored in memory or storage, hindering the adoption of NDP techniques for sensitive workloads. We propose SecNDP, a lightweight encryption and verification scheme for untrusted NDP devices to perform computation over ciphertext and verify the correctness of linear operations. Due to the fast-evolving and rigorous growth of production-grade recommendation serving nowadays, optimizing their cluster-level serving performance and efficiency in a heterogeneous datacenter is challenging from the following three perspectives, model diversity, cloud-scale system heterogeneity, and time-varying load patterns. We propose Hercules – a comprehensive optimization framework tailor-designed for at-scale neural recommendation inferences performing a two-stage optimization procedure – offline profiling and online serving.
As for the future direction of my dissertation, we will focus on tackling the system challenges for the future large-scale recommendation with disaggregated systems. The recommendation models are constantly evolving and tending to be heavily large to several terabytes in industry production. Disaggregation has recently emerged and could be a potential solution to tackle this fast algorithmic evolution and memory capacity challenges.