Sign Up

6760 Forest Park Pkwy, St. Louis, MO 63105, USA

https://cse.washu.edu/news-events/colloquia-series.html
View map

Yiheng Lin

In this talk, I will introduce our work on online policy optimization under time-varying dynamics and costs with possibly unknown dynamical models. We study a setting where the online agent seeks to minimize the total cost incurred over a finite horizon by optimizing the parameters for a given policy class. We propose the Gradient-based Adaptive Policy Selection (GAPS) algorithm that achieves the optimal policy regret and is efficient to implement. The key component of our theoretical analysis is establishing the connections between GAPS for online policy optimization and online gradient descent (OGD) for classic online optimization problem, which allow us to ‘transfer’ existing regret guarantees for OGD to GAPS. Further, I will present a meta-framework that can combine an online policy optimization algorithm like GAPS with an online model estimator to address the challenge of unknown nonlinear dynamical models. Compared with many prior works that study online control in unknown linear dynamical systems, our work provides a critical insight that learning the true dynamical model globally is unnecessary. Instead, the online model estimator only needs to predict well on the actual trajectory visited by the controller, which is a tractable goal for general nonlinear dynamical systems.