- Title
- Elastic step DQN : a novel multi-step algorithm to alleviate overestimation in deep Q-networks
- Creator
- Ly, Adrian; Dazeley, Richard; Vamplew, Peter; Cruz, Francisco; Aryal, Sunil
- Date
- 2024
- Type
- Text; Journal article
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/201635
- Identifier
- vital:19585
- Identifier
-
https://doi.org/10.1016/j.neucom.2023.127170
- Identifier
- ISSN:0925-2312 (ISSN)
- Abstract
- Deep Q-Networks algorithm (DQN) was the first reinforcement learning algorithm using deep neural network to successfully surpass human level performance in a number of Atari learning environments. However, divergent and unstable behaviour have been long standing issues in DQNs. The unstable behaviour is often characterised by overestimation in the Q-values, commonly referred to as the overestimation bias. To address the overestimation bias and the divergent behaviour, a number of heuristic extensions have been proposed. Notably, multi-step updates have been shown to drastically reduce unstable behaviour while improving agent's training performance. However, agents are often highly sensitive to the selection of the multi-step update horizon (n), and our empirical experiments show that a poorly chosen static value for n can in many cases lead to worse performance than single-step DQN. Inspired by the success of n-step DQN and the effects that multi-step updates have on overestimation bias, this paper proposes a new algorithm that we call ‘Elastic Step DQN’ (ES-DQN) to alleviate overestimation bias in DQNs. ES-DQN dynamically varies the step size horizon in multi-step updates based on the similarity between states visited. Our empirical evaluation shows that ES-DQN out-performs n-step with fixed n updates, Double DQN and Average DQN in several OpenAI Gym environments while at the same time alleviating the overestimation bias. © 2024 The Authors
- Publisher
- Elsevier B.V.
- Relation
- Neurocomputing Vol. 576, no. (2024), p.
- Rights
- All metadata describing materials held in, or linked to, the repository is freely available under a CC0 licence
- Rights
- http://creativecommons.org/licenses/by/4.0/
- Rights
- Copyright © 2024 The Authors
- Rights
- Open Access
- Subject
- 40 Engineering; 46 Information and computing sciences; 52 Psychology; DQN; Multi-step update; Neural network; Overestimation; Reinforcement learning
- Full Text
- Reviewed
- Hits: 884
- Visitors: 878
- Downloads: 26
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | SOURCE1 | Published version | 3 MB | Adobe Acrobat PDF | View Details Download |