Reinforcement learning of pareto-optimal multiobjective policies using steering

Vamplew, Peter; Issabekov, Rustam; Dazeley, Richard; Foale, Cameron

Title: Reinforcement learning of pareto-optimal multiobjective policies using steering
Creator: Vamplew, Peter; Issabekov, Rustam; Dazeley, Richard; Foale, Cameron
Date: 2015
Type: Text; Conference paper
Identifier: http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/100049
Identifier: vital:10468
Identifier: https://doi.org/10.1007/978-3-319-26350-2_53
Identifier: ISBN:03029743 (ISSN); 9783319263496 (ISBN)
Abstract: There has been little research into multiobjective reinforcement learning (MORL) algorithms using stochastic or non-stationary policies, even though such policies may Pareto-dominate deterministic stationary policies. One approach is steering which forms a nonstationary combination of deterministic stationary base policies. This paper presents two new steering algorithms designed for the task of learning Pareto-optimal policies. The first algorithm (w-steering) is a direct adaptation of previous approaches to steering, and therefore requires prior knowledge of recurrent states which are guaranteed to be revisited. The second algorithm (Q-steering) eliminates this requirement. Empirical results show that both algorithms perform well when given knowledge of recurrent states, but that Q-steering provides substantial performance improvements over w-steering when this knowledge is not available. © Springer International Publishing Switzerland 2015.
Publisher: Springer Verlag
Relation: 28th Australasian Joint Conference on Artificial Intelligence, AI 2015; Canberra, ACT; 30th November-4th December 2015 Vol. 9457, p. 596-608
Rights: This metadata is freely available under a CCO license
Subject: 08 Information and Computing Sciences; Multiobjective reinforcement learning; Non-stationary policies
Reviewed

Hits: 1287
Visitors: 1170
Downloads: 1

		Thumbnail	File	Description	Size	Format