- 控制理论与非线性滤波讨论班

Policy Learning

组织者

丘成栋

演讲者

陶飏天择

时间

2022年10月11日 20:00 至 20:30

地点

Online

摘要

In the talk, we will present the Policy-Based Reinforcement Learning and Policy Gradient. Policy learning means learning an optimal policy function or its approximation (such as a policy network) by solving an optimization problem. We first describe the policy network and then describe policy learning as a maximization problem. The policy gradient is then derived. Finally, different methods are used to approximate the policy gradient, resulting in two methods for training policy networks - REINFORCE and Actor-Critic.