- 控制理论和非线性滤波讨论班

The Advanced Techniques of Value Learning

组织者

丘成栋

演讲者

陶飏天择

时间

2022年09月27日 15:00 至 15:30

地点

Online

线上

Tencent 735 7908 4302 ()

摘要

We first introduce two ways to improve the TD algorithm and make the DQN train better. Then review Experience Replay and Prioritized Experience Replay. After that we will discuss the overestimation problem of DQN and its solution-Target Network and Double Q-learning. Next, two methods are introduced to improve the DQN neural network structure. Finally, we introduce the Dueling Network, which decomposes the Action-Value into State-Value and Advantage, and the Noise Net, which adds to the network parameters. Random noise, encouraging its exploration.