Policy Learning
Organizer
Speaker
Yangtianze Tao
Time
Tuesday, October 11, 2022 8:00 PM - 8:30 PM
Venue
Online
Abstract
In the talk, we will present the Policy-Based Reinforcement Learning and Policy Gradient. Policy learning means learning an optimal policy function or its approximation (such as a policy network) by solving an optimization problem. We first describe the policy network and then describe policy learning as a maximization problem. The policy gradient is then derived. Finally, different methods are used to approximate the policy gradient, resulting in two methods for training policy networks - REINFORCE and Actor-Critic.