Discrete-to-deep reinforcement learning methods

Kurniawan, Budi; Vamplew, Peter; Papasimeon, Michael; Dazeley, Richard; Foale, Cameron

Title: Discrete-to-deep reinforcement learning methods
Creator: Kurniawan, Budi; Vamplew, Peter; Papasimeon, Michael; Dazeley, Richard; Foale, Cameron
Date: 2022
Type: Text; Journal article
Identifier: http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/187273
Identifier: vital:17045
Identifier: https://doi.org/10.1007/s00521-021-06270-6
Identifier: ISBN:0941-0643 (ISSN)
Abstract: Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. In complex problems, a neural RL approach is often able to learn a better solution than tabular RL, but generally takes longer. This paper proposes two methods, Discrete-to-Deep Supervised Policy Learning (D2D-SPL) and Discrete-to-Deep Supervised Q-value Learning (D2D-SQL), whose objective is to acquire the generalisability of a neural network at a cost nearer to that of a tabular method. Both methods combine RL and supervised learning (SL) and are based on the idea that a fast-learning tabular method can generate off-policy data to accelerate learning in neural RL. D2D-SPL uses the data to train a classifier which is then used as a controller for the RL problem. D2D-SQL uses the data to initialise a neural network which is then allowed to continue learning using another RL method. We demonstrate the viability of our algorithms with Cartpole, Lunar Lander and an aircraft manoeuvring problem, three continuous-space environments with low-dimensional state variables. Both methods learn at least 38% faster than baseline methods and yield policies that outperform them. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
Publisher: Springer Science and Business Media Deutschland GmbH
Relation: Neural Computing and Applications Vol. 34, no. 3 (2022), p. 1713-1733
Rights: All metadata describing materials held in, or linked to, the repository is freely available under a CC0 licence
Rights: Open Access
Subject: 4602 Artificial intelligence; 4603 Computer vision and multimedia computation; 4611 Machine learning; Actor-critic; DQN; Neural network; Reinforcement learning; Supervised learning
Full Text
Reviewed
Funder: This research is supported by the Defence Science and Technology Group, Australia; the Defence Science Institute, Australia; and an Australian Government Research Training Program Fee-offset scholarship. Associate Professor Joarder Kamruzzaman of the Centre for Multimedia Computing, Communications, and Artificial Intelligence Research (MCCAIR) at Federation University contributed some of the computing resources for this project.

Hits: 2129
Visitors: 1379
Downloads: 126

		Thumbnail	File	Description	Size	Format
View Details Download			SOURCE2	Accepted version	1 MB	Adobe Acrobat PDF	View Details Download