BIMSA >
控制理论和非线性滤波讨论班
Policy Gradient Methods with Baselines and Advanced Techniques for Policy Learning
Policy Gradient Methods with Baselines and Advanced Techniques for Policy Learning
组织者
演讲者
陶飏天择
时间
2022年10月25日 20:00 至 20:30
地点
Online
摘要
Last time we derived policy gradients and introduced two policy gradient methods-REINFORCE and Actor-Critic. While the method mentioned earlier is correct in theory, it does not work well in practice. With baseline introduced this time Policy Gradient with Baseline can greatly improve the performance of policy gradient methods. use baseline After (Baseline), REINFORCE becomes REINFORCE with Baseline and Actor-Critic becomes Advantage Actor-Critic (A2C).