- 控制理论与非线性滤波讨论班

Policy Gradient Methods with Baselines and Advanced Techniques for Policy Learning

组织者

丘成栋

演讲者

陶飏天择

时间

2022年10月25日 20:00 至 20:30

地点

Online

摘要

Last time we derived policy gradients and introduced two policy gradient methods-REINFORCE and Actor-Critic. While the method mentioned earlier is correct in theory, it does not work well in practice. With baseline introduced this time Policy Gradient with Baseline can greatly improve the performance of policy gradient methods. use baseline After (Baseline), REINFORCE becomes REINFORCE with Baseline and Actor-Critic becomes Advantage Actor-Critic (A2C).