- Title
- Reinforcement learning of pareto-optimal multiobjective policies using steering
- Creator
- Vamplew, Peter; Issabekov, Rustam; Dazeley, Richard; Foale, Cameron
- Date
- 2015
- Type
- Text; Conference paper
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/100049
- Identifier
- vital:10468
- Identifier
-
https://doi.org/10.1007/978-3-319-26350-2_53
- Identifier
- ISBN:03029743 (ISSN); 9783319263496 (ISBN)
- Abstract
- There has been little research into multiobjective reinforcement learning (MORL) algorithms using stochastic or non-stationary policies, even though such policies may Pareto-dominate deterministic stationary policies. One approach is steering which forms a nonstationary combination of deterministic stationary base policies. This paper presents two new steering algorithms designed for the task of learning Pareto-optimal policies. The first algorithm (w-steering) is a direct adaptation of previous approaches to steering, and therefore requires prior knowledge of recurrent states which are guaranteed to be revisited. The second algorithm (Q-steering) eliminates this requirement. Empirical results show that both algorithms perform well when given knowledge of recurrent states, but that Q-steering provides substantial performance improvements over w-steering when this knowledge is not available. © Springer International Publishing Switzerland 2015.
- Publisher
- Springer Verlag
- Relation
- 28th Australasian Joint Conference on Artificial Intelligence, AI 2015; Canberra, ACT; 30th November-4th December 2015 Vol. 9457, p. 596-608
- Rights
- Copyright © Springer International Publishing Switzerland 2015.
- Rights
- This metadata is freely available under a CCO license
- Subject
- 08 Information and Computing Sciences; Multiobjective reinforcement learning; Non-stationary policies
- Reviewed
- Hits: 1750
- Visitors: 1631
- Downloads: 1
Thumbnail | File | Description | Size | Format |
---|