- Title
- A prioritized objective actor-critic method for deep reinforcement learning
- Creator
- Nguyen, Ngoc; Nguyen, Thanh; Vamplew, Peter; Dazeley, Richard; Nahavandi, Saeid
- Date
- 2021
- Type
- Text; Journal article
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/186104
- Identifier
- vital:16825
- Identifier
-
https://doi.org/10.1007/s00521-021-05795-0
- Identifier
- ISBN:0941-0643 (ISSN)
- Abstract
- An increasing number of complex problems have naturally posed significant challenges in decision-making theory and reinforcement learning practices. These problems often involve multiple conflicting reward signals that inherently cause agents’ poor exploration in seeking a specific goal. In extreme cases, the agent gets stuck in a sub-optimal solution and starts behaving harmfully. To overcome such obstacles, we introduce two actor-critic deep reinforcement learning methods, namely Multi-Critic Single Policy (MCSP) and Single Critic Multi-Policy (SCMP), which can adjust agent behaviors to efficiently achieve a designated goal by adopting a weighted-sum scalarization of different objective functions. In particular, MCSP creates a human-centric policy that corresponds to a predefined priority weight of different objectives. Whereas, SCMP is capable of generating a mixed policy based on a set of priority weights, i.e., the generated policy uses the knowledge of different policies (each policy corresponds to a priority weight) to dynamically prioritize objectives in real time. We examine our methods by using the Asynchronous Advantage Actor-Critic (A3C) algorithm to utilize the multithreading mechanism for dynamically balancing training intensity of different policies into a single network. Finally, simulation results show that MCSP and SCMP significantly outperform A3C with respect to the mean of total rewards in two complex problems: Food Collector and Seaquest. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd. part of Springer Nature.
- Publisher
- Springer Science and Business Media Deutschland GmbH
- Relation
- Neural Computing and Applications Vol. 33, no. 16 (2021), p. 10335-10349
- Rights
- All metadata describing materials held in, or linked to, the repository is freely available under a CC0 licence
- Rights
- Copyright @ The Author(s)
- Subject
- 4602 Artificial intelligence; 4603 Computer vision and multimedia computation; 4611 Machine learning; Actor-critic architecture; Deep learning; Learning systems; Multi-objective optimization; Reinforcement learning
- Reviewed
- Hits: 1329
- Visitors: 1181
- Downloads: 0
Thumbnail | File | Description | Size | Format |
---|