Policy Gradient Methods with Baselines and Advanced Techniques for Policy Learning

Organizer

Stephen S-T. Yau

Speaker

Yangtianze Tao

Time

Tuesday, October 25, 2022 8:00 PM - 8:30 PM

Venue

Online

Abstract

Last time we derived policy gradients and introduced two policy gradient methods-REINFORCE and Actor-Critic. While the method mentioned earlier is correct in theory, it does not work well in practice. With baseline introduced this time Policy Gradient with Baseline can greatly improve the performance of policy gradient methods. use baseline After (Baseline), REINFORCE becomes REINFORCE with Baseline and Actor-Critic becomes Advantage Actor-Critic (A2C).