The Advanced Techniques of Value Learning
Organizer
Speaker
Yangtianze Tao
Time
Tuesday, September 27, 2022 3:00 PM - 3:30 PM
Venue
Online
Online
Tencent 735 7908 4302
()
Abstract
We first introduce two ways to improve the TD algorithm and make the DQN train better. Then review Experience Replay and Prioritized Experience Replay. After that we will discuss the overestimation problem of DQN and its solution-Target Network and Double Q-learning. Next, two methods are introduced to improve the DQN neural network structure. Finally, we introduce the Dueling Network, which decomposes the Action-Value into State-Value and Advantage, and the Noise Net, which adds to the network parameters. Random noise, encouraging its exploration.