An empirical comparison of two common multiobjective reinforcement learning algorithms
- Authors: Issabekov, Rustam , Vamplew, Peter
- Date: 2012
- Type: Text , Conference paper
- Relation: 25th Australasian Joint Conference on Artificial Intelligence, AI 2012 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 7691 LNAI, p. 626-636
- Full Text:
- Reviewed:
- Description: In this paper we provide empirical data of the performance of the two most commonly used multiobjective reinforcement learning algorithms against a set of benchmarks. First, we describe a methodology that was used in this paper. Then, we carefully describe the details and properties of the proposed problems and how those properties influence the behavior of tested algorithms. We also introduce a testing framework that will significantly improve future empirical comparisons of multiobjective reinforcement learning algorithms. We hope this testing environment eventually becomes a central repository of test problems and algorithms The empirical results clearly identify features of the test problems which impact on the performance of each algorithm, demonstrating the utility of empirical testing of algorithms on problems with known characteristics. © 2012 Springer-Verlag.
- Description: 2003010655
Empirical evaluation methods for multiobjective reinforcement learning algorithms
- Authors: Vamplew, Peter , Dazeley, Richard , Berry, Adam , Issabekov, Rustam , Dekker, Evan
- Date: 2011
- Type: Text , Journal article
- Relation: Machine Learning Vol. 84, no. 1-2 (2011), p. 51-80
- Full Text: false
- Reviewed:
- Description: While a number of algorithms for multiobjective reinforcement learning have been proposed, and a small number of applications developed, there has been very little rigorous empirical evaluation of the performance and limitations of these algorithms. This paper proposes standard methods for such empirical evaluation, to act as a foundation for future comparative studies. Two classes of multiobjective reinforcement learning algorithms are identified, and appropriate evaluation metrics and methodologies are proposed for each class. A suite of benchmark problems with known Pareto fronts is described, and future extensions and implementations of this benchmark suite are discussed. The utility of the proposed evaluation methods are demonstrated via an empirical comparison of two example learning algorithms. © 2010 The Author(s).
Reinforcement learning of pareto-optimal multiobjective policies using steering
- Authors: Vamplew, Peter , Issabekov, Rustam , Dazeley, Richard , Foale, Cameron
- Date: 2015
- Type: Text , Conference paper
- Relation: 28th Australasian Joint Conference on Artificial Intelligence, AI 2015; Canberra, ACT; 30th November-4th December 2015 Vol. 9457, p. 596-608
- Full Text: false
- Reviewed:
- Description: There has been little research into multiobjective reinforcement learning (MORL) algorithms using stochastic or non-stationary policies, even though such policies may Pareto-dominate deterministic stationary policies. One approach is steering which forms a nonstationary combination of deterministic stationary base policies. This paper presents two new steering algorithms designed for the task of learning Pareto-optimal policies. The first algorithm (w-steering) is a direct adaptation of previous approaches to steering, and therefore requires prior knowledge of recurrent states which are guaranteed to be revisited. The second algorithm (Q-steering) eliminates this requirement. Empirical results show that both algorithms perform well when given knowledge of recurrent states, but that Q-steering provides substantial performance improvements over w-steering when this knowledge is not available. © Springer International Publishing Switzerland 2015.
Steering approaches to Pareto-optimal multiobjective reinforcement learning
- Authors: Vamplew, Peter , Issabekov, Rustam , Dazeley, Richard , Foale, Cameron , Berry, Adam , Moore, Tim , Creighton, Douglas
- Date: 2017
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 263, no. (2017), p. 26-38
- Full Text:
- Reviewed:
- Description: For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent’s target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system.