- Title
- Discrete-to-deep reinforcement learning methods
- Creator
- Kurniawan, Budi; Vamplew, Peter; Papasimeon, Michael; Dazeley, Richard; Foale, Cameron
- Date
- 2022
- Type
- Text; Journal article
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/187273
- Identifier
- vital:17045
- Identifier
-
https://doi.org/10.1007/s00521-021-06270-6
- Identifier
- ISBN:0941-0643 (ISSN)
- Abstract
- Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. In complex problems, a neural RL approach is often able to learn a better solution than tabular RL, but generally takes longer. This paper proposes two methods, Discrete-to-Deep Supervised Policy Learning (D2D-SPL) and Discrete-to-Deep Supervised Q-value Learning (D2D-SQL), whose objective is to acquire the generalisability of a neural network at a cost nearer to that of a tabular method. Both methods combine RL and supervised learning (SL) and are based on the idea that a fast-learning tabular method can generate off-policy data to accelerate learning in neural RL. D2D-SPL uses the data to train a classifier which is then used as a controller for the RL problem. D2D-SQL uses the data to initialise a neural network which is then allowed to continue learning using another RL method. We demonstrate the viability of our algorithms with Cartpole, Lunar Lander and an aircraft manoeuvring problem, three continuous-space environments with low-dimensional state variables. Both methods learn at least 38% faster than baseline methods and yield policies that outperform them. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
- Publisher
- Springer Science and Business Media Deutschland GmbH
- Relation
- Neural Computing and Applications Vol. 34, no. 3 (2022), p. 1713-1733
- Rights
- All metadata describing materials held in, or linked to, the repository is freely available under a CC0 licence
- Rights
- Copyright © 2021, The Author(s)
- Rights
- Open Access
- Subject
- 4602 Artificial intelligence; 4603 Computer vision and multimedia computation; 4611 Machine learning; Actor-critic; DQN; Neural network; Reinforcement learning; Supervised learning
- Full Text
- Reviewed
- Funder
- This research is supported by the Defence Science and Technology Group, Australia; the Defence Science Institute, Australia; and an Australian Government Research Training Program Fee-offset scholarship. Associate Professor Joarder Kamruzzaman of the Centre for Multimedia Computing, Communications, and Artificial Intelligence Research (MCCAIR) at Federation University contributed some of the computing resources for this project.
- Hits: 2931
- Visitors: 2211
- Downloads: 161
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | SOURCE2 | Accepted version | 1 MB | Adobe Acrobat PDF | View Details Download |