Policy Learning
组织者
演讲者
陶飏天择
时间
2022年10月11日 20:00 至 20:30
地点
Online
摘要
In the talk, we will present the Policy-Based Reinforcement Learning and Policy Gradient. Policy learning means learning an optimal policy function or its approximation (such as a policy network) by solving an optimization problem. We first describe the policy network and then describe policy learning as a maximization problem. The policy gradient is then derived. Finally, different methods are used to approximate the policy gradient, resulting in two methods for training policy networks - REINFORCE and Actor-Critic.