The Advanced Techniques of Value Learning

Organizer

Stephen S-T. Yau

Speaker

Yangtianze Tao

Time

Tuesday, September 27, 2022 3:00 PM - 3:30 PM

Venue

Online

Tencent 735 7908 4302 ()

Abstract

We first introduce two ways to improve the TD algorithm and make the DQN train better. Then review Experience Replay and Prioritized Experience Replay. After that we will discuss the overestimation problem of DQN and its solution-Target Network and Double Q-learning. Next, two methods are introduced to improve the DQN neural network structure. Finally, we introduce the Dueling Network, which decomposes the Action-Value into State-Value and Advantage, and the Noise Net, which adds to the network parameters. Random noise, encouraging its exploration.