- Title
- The impact of environmental stochasticity on value-based multiobjective reinforcement learning
- Creator
- Vamplew, Peter; Foale, Cameron; Dazeley, Richard
- Date
- 2022
- Type
- Text; Journal article
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/184745
- Identifier
- vital:16585
- Identifier
-
https://doi.org/10.1007/s00521-021-05859-1
- Identifier
- ISBN:0941-0643 (ISSN)
- Abstract
- A common approach to address multiobjective problems using reinforcement learning methods is to extend model-free, value-based algorithms such as Q-learning to use a vector of Q-values in combination with an appropriate action selection mechanism that is often based on scalarisation. Most prior empirical evaluation of these approaches has focused on deterministic environments. This study examines the impact on stochasticity in rewards and state transitions on the behaviour of multi-objective Q-learning. It shows that the nature of the optimal solution depends on these environmental characteristics, and also on whether we desire to maximise the Expected Scalarised Return (ESR) or the Scalarised Expected Return (SER). We also identify a novel aim which may arise in some applications of maximising SER subject to satisfying constraints on the variation in return and show that this may require different solutions than ESR or conventional SER. The analysis of the interaction between environmental stochasticity and multi-objective Q-learning is supported by empirical evaluations on several simple multiobjective Markov Decision Processes with varying characteristics. This includes a demonstration of a novel approach to learning deterministic SER-optimal policies for environments with stochastic rewards. In addition, we report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. Having highlighted the limitations of value-based model-free MORL methods, we discuss several alternative methods that may be more suitable for maximising SER in MOMDPs with stochastic transitions. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
- Publisher
- Springer Science and Business Media Deutschland GmbH
- Relation
- Neural Computing and Applications Vol. 34, no. 3 (2022), p. 1783-1799
- Rights
- All metadata describing materials held in, or linked to, the repository is freely available under a CC0 licence
- Rights
- Copyright © 2021, The Author(s)
- Rights
- Open Access
- Subject
- 4602 Artificial Intelligence; 4603 Computer Vision and Multimedia Computation; 4611 Machine Learning; Multiobjective MDPs; Multiobjective reinforcement learning; Stochastic MDPs
- Full Text
- Reviewed
- Hits: 7599
- Visitors: 1223
- Downloads: 184
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | SOURCE1 | Submitted version | 956 KB | Adobe Acrobat PDF | View Details Download |