Policy Learning - Seminar on Control Theory and Nonlinear Filtering

Policy Learning

Organizer

Stephen S-T. Yau

Speaker

Yangtianze Tao

Time

Tuesday, October 11, 2022 8:00 PM - 8:30 PM

Venue

Online

Abstract

In the talk, we will present the Policy-Based Reinforcement Learning and Policy Gradient. Policy learning means learning an optimal policy function or its approximation (such as a policy network) by solving an optimization problem. We first describe the policy network and then describe policy learning as a maximization problem. The policy gradient is then derived. Finally, different methods are used to approximate the policy gradient, resulting in two methods for training policy networks - REINFORCE and Actor-Critic.