The Advanced Techniques of Value Learning
组织者
演讲者
陶飏天择
时间
2022年09月27日 15:00 至 15:30
地点
Online
线上
Tencent 735 7908 4302
()
摘要
We first introduce two ways to improve the TD algorithm and make the DQN train better. Then review Experience Replay and Prioritized Experience Replay. After that we will discuss the overestimation problem of DQN and its solution-Target Network and Double Q-learning. Next, two methods are introduced to improve the DQN neural network structure. Finally, we introduce the Dueling Network, which decomposes the Action-Value into State-Value and Advantage, and the Noise Net, which adds to the network parameters. Random noise, encouraging its exploration.