List of Titles

Showing items 1 - 17 of 17

Your selections:

Reinforcement learning

Incorporating expert advice into reinforcement learning using constructive neural networks

- Ollington, Robert, Vamplew, Peter, Swanson, John

Authors: Ollington, Robert , Vamplew, Peter , Swanson, John
Date: 2009
Type: Text , Book chapter
Relation: Constructive Neural Networks Chapter p. 207-224
Full Text: false
Description: This paper presents and investigates a novel approach to using expert advice to speed up the learning performance of an agent operating within a reinforcement learning framework. This is accomplished through the use of a constructive neural network based on radial basis functions. It is demonstrated that incorporating advice from a human teacher can substantially improve the performance of a reinforcement learning agent, and that the constructive algorithm proposed is particularly effective at aiding the early performance of the agent, whilst reducing the amount of feedback required from the teacher. The use of constructive networks within a reinforcement learning context is a relatively new area of research in itself, and so this paper also provides a review of the previous work in this area, as a guide for future researchers. © 2009 Springer-Verlag Berlin Heidelberg.

On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts

- Vamplew, Peter, Yearwood, John, Dazeley, Richard, Berry, Adam

Authors: Vamplew, Peter , Yearwood, John , Dazeley, Richard , Berry, Adam
Date: 2008
Type: Text , Conference paper
Relation: Paper presented at 21st Australasian Joint Conference on Artificial Intelligence, Auckland, New Zealand : 1st-5th December 2008 Vol. 5360, p. 372-378
Full Text: false
Description: Multiobjective reinforcement learning (MORL) extends RL to problems with multiple conflicting objectives. This paper argues for designing MORL systems to produce a set of solutions approximating the Pareto front, and shows that the common MORL technique of scalarisation has fundamental limitations when used to find Pareto-optimal policies. The work is supported by the presentation of three new MORL benchmarks with known Pareto fronts.
Description: 2003006504

Quick View

Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks

- Vamplew, Peter, Dazeley, Richard, Barker, Ewan, Kelarev, Andrei

Authors: Vamplew, Peter , Dazeley, Richard , Barker, Ewan , Kelarev, Andrei
Date: 2009
Type: Text , Book chapter
Relation: AI 2009 : Advances in Artificial Intelligence : 22nd Australasian Joint Conference, Melbourne, Australia, December 1-4, 2009. Proceedings Chapter p. 340-349
Full Text:
Description: Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problems with multiple conflicting objectives. This paper discusses the advantages gained from applying stochastic policies to multiobjective tasks and examines a particular form of stochastic policy known as a mixture policy. Two methods are proposed for deriving mixture policies for episodic multiobjective tasks from deterministic base policies found via scalarised reinforcement learning. It is shown that these approaches are an efficient means of identifying solutions which offer a superior match to the user’s preferences than can be achieved by methods based strictly on deterministic policies.
Description: 2003007906

Quick View

Applying reinforcement learning in playing Robosoccer using the AIBO

- Mukherjee, Subhasis

Authors: Mukherjee, Subhasis
Date: 2010
Type: Text , Thesis , Masters
Full Text:
Description: "Robosoccer is a popular test bed for AI programs around the world in which AIBO entertainments robots take part in the middle sized soccer event. These robots need a variety of skills to perform in a semi-real environment like this. The three key challenges are manoeuvrability, image recognition and decision making skills. This research is focussed on the decision making skills ... The work focuses on whether reinforcement learning as a form of semi supervised learning can effectively contribute to the goal keeper's decision making when a shot is taken." -
Description: Master of Computing (by research)

Empirical evaluation methods for multiobjective reinforcement learning algorithms

- Vamplew, Peter, Dazeley, Richard, Berry, Adam, Issabekov, Rustam, Dekker, Evan

Authors: Vamplew, Peter , Dazeley, Richard , Berry, Adam , Issabekov, Rustam , Dekker, Evan
Date: 2011
Type: Text , Journal article
Relation: Machine Learning Vol. 84, no. 1-2 (2011), p. 51-80
Full Text: false
Reviewed:
Description: While a number of algorithms for multiobjective reinforcement learning have been proposed, and a small number of applications developed, there has been very little rigorous empirical evaluation of the performance and limitations of these algorithms. This paper proposes standard methods for such empirical evaluation, to act as a foundation for future comparative studies. Two classes of multiobjective reinforcement learning algorithms are identified, and appropriate evaluation metrics and methodologies are proposed for each class. A suite of benchmark problems with known Pareto fronts is described, and future extensions and implementations of this benchmark suite are discussed. The utility of the proposed evaluation methods are demonstrated via an empirical comparison of two example learning algorithms. Â© 2010 The Author(s).

Coarse Q-Learning : Addressing the convergence problem when quantizing continuous state variables

- Dazeley, Richard, Vamplew, Peter, Bignold, Adam

Authors: Dazeley, Richard , Vamplew, Peter , Bignold, Adam
Date: 2015
Type: Text , Conference paper
Relation: 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making
Full Text: false
Reviewed:
Description: Value-based approaches to reinforcement learning (RL) maintain a value function that measures the long term utility of a state or state-action pair. A long standing issue in RL is how to create a finite representation in a continuous, and therefore infinite, state environment. The common approach is to use function approximators such as tile coding, memory or instance based methods. These provide some balance between generalisation, resolution, and storage, but converge slowly in multidimensional state environments. Another approach of quantizing state into lookup tables has been commonly regarded as highly problematic, due to large memory requirements and poor generalisation. In particular , attempting to reduce memory requirements and increase generalisation by using coarser quantization forms a non-Markovian system that does not converge. This paper investigates the problem in using quantized lookup tables and presents an extension to the Q-Learning algorithm, referred to as Coarse Q-Learning (C QL), which resolves these issues. The presented algorithm will be shown to drastically reduce the memory requirements and increase generalisation by simulating the Markov property. In particular, this algorithm means the size of the input space is determined by the granularity required by the policy being learnt, rather than by the inadequacies of the learning algorithm or the nature of the state-reward dynamics of the environment. Importantly, the method presented solves the problem represented by the curse of dimensionality.

Quick View

An evaluation methodology for interactive reinforcement learning with simulated users

- Bignold, Adam, Cruz, Francisco, Dazeley, Richard, Vamplew, Peter, Foale, Cameron

Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
Date: 2021
Type: Text , Journal article
Relation: Biomimetics Vol. 6, no. 1 (2021), p. 1-15
Full Text:
Reviewed:
Description: Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.

Adaptive Dynamic Programming based Control Scheme for Uncertain Two-Wheel Robots

- Van Nguyen, Thien, Le, Hai, Tran, Hoang, Nguyen, Duc, Nguyen, Minh, Nguyen, Linh

Authors: Van Nguyen, Thien , Le, Hai , Tran, Hoang , Nguyen, Duc , Nguyen, Minh , Nguyen, Linh
Date: 2021
Type: Text , Conference paper
Relation: 2021 IEEE International Conference on Autonomous Robot Systems and Competitions, ICARSC 2021, 28 April 2021 through 29 April 2021 p. 111-116
Full Text: false
Reviewed:
Description: The paper addresses the problem of effectively controlling a two-wheel robot given its inherent non-linearity and parameter uncertainties. In order to deal with the unknown and uncertain dynamics of the robot, it is proposed to employ the adaptive dynamic programming, a reinforcement learning based technique, to develop an optimal control law. It is interesting that the proposed algorithm does not require kinematic parameters while finding the optimal state controller is guaranteed. Moreover, convergence of the optimal control scheme is theoretically proved. The proposed approach was implemented in a synthetic two-wheel robot where the obtained results demonstrate its effectiveness. © 2021 IEEE.

Quick View

Language representations for generalization in reinforcement learning

- Goodger, Nikolaj, Vamplew, Peter, Foale, Cameron, Dazeley, Richard

Authors: Goodger, Nikolaj , Vamplew, Peter , Foale, Cameron , Dazeley, Richard
Date: 2021
Type: Text , Conference paper
Relation: 13th Asian Conference on Machine Learning, Virtual, 17-19 November 2021, Proceedings of The 13th Asian Conference on Machine Learning Vol. 157, p. 390-405
Full Text:
Reviewed:
Description: The choice of state and action representation in Reinforcement Learning (RL) has a significant effect on agent performance for the training task. But its relationship with generalization to new tasks is under-explored. One approach to improving generalization investigated here is the use of language as a representation. We compare vector-states and discreteactions to language representations. We find the agents using language representations generalize better and could solve tasks with more entities, new entities, and more complexity than seen in the training task. We attribute this to the compositionality of language

Quick View

A multi-objective deep reinforcement learning framework

- Nguyen, Thanh, Nguyen, Ngoc, Vamplew, Peter, Nahavandi, Saeid, Dazeley, Richard, Lim, Chee

Authors: Nguyen, Thanh , Nguyen, Ngoc , Vamplew, Peter , Nahavandi, Saeid , Dazeley, Richard , Lim, Chee
Date: 2020
Type: Text , Journal article
Relation: Engineering Applications of Artificial Intelligence Vol. 96, no. (2020), p.
Full Text:
Reviewed:
Description: This paper introduces a new scalable multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We develop a high-performance MODRL framework that supports both single-policy and multi-policy strategies, as well as both linear and non-linear approaches to action selection. The experimental results on two benchmark problems (two-objective deep sea treasure environment and three-objective Mountain Car problem) indicate that the proposed framework is able to find the Pareto-optimal solutions effectively. The proposed framework is generic and highly modularized, which allows the integration of different deep reinforcement learning algorithms in different complex problem domains. This therefore overcomes many disadvantages involved with standard multi-objective reinforcement learning methods in the current literature. The proposed framework acts as a testbed platform that accelerates the development of MODRL for solving increasingly complicated multi-objective problems. © 2020 Elsevier Ltd

A prioritized objective actor-critic method for deep reinforcement learning

- Nguyen, Ngoc, Nguyen, Thanh, Vamplew, Peter, Dazeley, Richard, Nahavandi, Saeid

Authors: Nguyen, Ngoc , Nguyen, Thanh , Vamplew, Peter , Dazeley, Richard , Nahavandi, Saeid
Date: 2021
Type: Text , Journal article
Relation: Neural Computing and Applications Vol. 33, no. 16 (2021), p. 10335-10349
Full Text: false
Reviewed:
Description: An increasing number of complex problems have naturally posed significant challenges in decision-making theory and reinforcement learning practices. These problems often involve multiple conflicting reward signals that inherently cause agents’ poor exploration in seeking a specific goal. In extreme cases, the agent gets stuck in a sub-optimal solution and starts behaving harmfully. To overcome such obstacles, we introduce two actor-critic deep reinforcement learning methods, namely Multi-Critic Single Policy (MCSP) and Single Critic Multi-Policy (SCMP), which can adjust agent behaviors to efficiently achieve a designated goal by adopting a weighted-sum scalarization of different objective functions. In particular, MCSP creates a human-centric policy that corresponds to a predefined priority weight of different objectives. Whereas, SCMP is capable of generating a mixed policy based on a set of priority weights, i.e., the generated policy uses the knowledge of different policies (each policy corresponds to a priority weight) to dynamically prioritize objectives in real time. We examine our methods by using the Asynchronous Advantage Actor-Critic (A3C) algorithm to utilize the multithreading mechanism for dynamically balancing training intensity of different policies into a single network. Finally, simulation results show that MCSP and SCMP significantly outperform A3C with respect to the mean of total rewards in two complex problems: Food Collector and Seaquest. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd. part of Springer Nature.

Quick View

Scalar reward is not enough : a response to Silver, Singh, Precup and Sutton (2021)

- Vamplew, Peter, Smith, Benjamin, Källström, Johan, Ramos, Gabriel, Rădulescu, Roxana, Roijers, Diederik, Hayes, Conor, Heintz, Fredrik, Mannion, Patrick, Libin, Pieter, Dazeley, Richard, Foale, Cameron

Authors: Vamplew, Peter , Smith, Benjamin , Källström, Johan , Ramos, Gabriel , Rădulescu, Roxana , Roijers, Diederik , Hayes, Conor , Heintz, Fredrik , Mannion, Patrick , Libin, Pieter , Dazeley, Richard , Foale, Cameron
Date: 2022
Type: Text , Journal article
Relation: Autonomous Agents and Multi-Agent Systems Vol. 36, no. 2 (2022), p.
Full Text:
Reviewed:
Description: The recent paper “Reward is Enough” by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, this type of reward is insufficient for the development of human-aligned artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour. © 2022, The Author(s).

Quick View

Discrete-to-deep reinforcement learning methods

- Kurniawan, Budi, Vamplew, Peter, Papasimeon, Michael, Dazeley, Richard, Foale, Cameron

Authors: Kurniawan, Budi , Vamplew, Peter , Papasimeon, Michael , Dazeley, Richard , Foale, Cameron
Date: 2022
Type: Text , Journal article
Relation: Neural Computing and Applications Vol. 34, no. 3 (2022), p. 1713-1733
Full Text:
Reviewed:
Description: Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. In complex problems, a neural RL approach is often able to learn a better solution than tabular RL, but generally takes longer. This paper proposes two methods, Discrete-to-Deep Supervised Policy Learning (D2D-SPL) and Discrete-to-Deep Supervised Q-value Learning (D2D-SQL), whose objective is to acquire the generalisability of a neural network at a cost nearer to that of a tabular method. Both methods combine RL and supervised learning (SL) and are based on the idea that a fast-learning tabular method can generate off-policy data to accelerate learning in neural RL. D2D-SPL uses the data to train a classifier which is then used as a controller for the RL problem. D2D-SQL uses the data to initialise a neural network which is then allowed to continue learning using another RL method. We demonstrate the viability of our algorithms with Cartpole, Lunar Lander and an aircraft manoeuvring problem, three continuous-space environments with low-dimensional state variables. Both methods learn at least 38% faster than baseline methods and yield policies that outperform them. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.

Quick View

A brief guide to multi-objective reinforcement learning and planning JAAMAS track

- Hayes, Conor, Bargiacchi, Eugenio, Källström, Johan, Macfarlane, Matthew, Reymond, Mathieu, Verstraeten, Timothy, Zintgraf, Luisa, Dazeley, Richard, Heintz, Frederik, Howley, Enda, Irissappane, Aathirai, Mannion, Patrick, Nowé, Ann, Ramos, Gabriel, Restelli, Marcello, Vamplew, Peter, Roijers, Diederik

Authors: Hayes, Conor , Bargiacchi, Eugenio , Källström, Johan , Macfarlane, Matthew , Reymond, Mathieu , Verstraeten, Timothy , Zintgraf, Luisa , Dazeley, Richard , Heintz, Frederik , Howley, Enda , Irissappane, Aathirai , Mannion, Patrick , Nowé, Ann , Ramos, Gabriel , Restelli, Marcello , Vamplew, Peter , Roijers, Diederik
Date: 2023
Type: Text , Conference paper
Relation: 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023, London, 29 May to 2 June 2023, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS Vol. 2023-May, p. 1988-1990
Full Text:
Reviewed:
Description: Real-world sequential decision-making tasks are usually complex, and require trade-offs between multiple - often conflicting - objectives. However, the majority of research in reinforcement learning (RL) and decision-theoretic planning assumes a single objective, or that multiple objectives can be handled via a predefined weighted sum over the objectives. Such approaches may oversimplify the underlying problem, and produce suboptimal results. This extended abstract outlines the limitations of using a semi-blind iterative process to solve multi-objective decision making problems. Our extended paper [4], serves as a guide for the application of explicitly multi-objective methods to difficult problems. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.

Quick View

A nethack learning environment language wrapper for autonomous agents

- Goodger, Nikolaj, Vamplew, Peter, Foale, Cameron, Dazeley, Richard

Authors: Goodger, Nikolaj , Vamplew, Peter , Foale, Cameron , Dazeley, Richard
Date: 2023
Type: Text , Journal article
Relation: Journal of Open Research Software Vol. 11, no. (2023), p.
Full Text:
Reviewed:
Description: This paper describes a language wrapper for the NetHack Learning Environment (NLE) [1]. The wrapper replaces the non-language observations and actions with comparable language versions. The NLE offers a grand challenge for AI research while MiniHack [2] extends this potential to more specific and configurable tasks. By providing a language interface, we can enable further research on language agents and directly connect language models to a versatile environment. © 2023 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.

Scalar reward is not enough JAAMAS Track

- Vamplew, Peter, Smith, Benjamin, Källström, Johan, Ramos, Gabriel, Rădulescu, Roxana, Roijers, Diederik, Hayes, Conor, Heintz, Frederik, Mannion, Patrick, Libin, Pieter, Dazeley, Richard, Foale, Cameron

Authors: Vamplew, Peter , Smith, Benjamin , Källström, Johan , Ramos, Gabriel , Rădulescu, Roxana , Roijers, Diederik , Hayes, Conor , Heintz, Frederik , Mannion, Patrick , Libin, Pieter , Dazeley, Richard , Foale, Cameron
Date: 2023
Type: Text , Conference paper
Relation: 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023, London, 29 May to 2 June 2023, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, Vol. 2023-May, p. 839-841
Full Text: false
Reviewed:
Description: Silver et al. [14] posit that scalar reward maximisation is sufficient to underpin all intelligence and provides a suitable basis for artificial general intelligence (AGI). This extended abstract summarises the counter-argument from our JAAMAS paper[19]. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.

Elastic step DDPG : multi-step reinforcement learning for improved sample efficiency

- Ly, Adrian, Dazeley, Richard, Vamplew, Peter, Cruz, Francisco, Aryal, Sunil

Authors: Ly, Adrian , Dazeley, Richard , Vamplew, Peter , Cruz, Francisco , Aryal, Sunil
Date: 2023
Type: Text , Conference paper
Relation: 2023 International Joint Conference on Neural Networks, IJCNN 2023 Vol. 2023-June
Full Text: false
Reviewed:
Description: A major challenge in deep reinforcement learning is that it requires more data to converge to an policy for complex problems. One way to improve sample efficiency is to use n-step updates to reduce the number of samples required to converge to a good policy. However n-step updates are known to be brittle and difficult to tune. Elastic Step DQN has shown that it is possible to automate the value of n in DQN to solve problems involving discrete action spaces, however the efficacy of the technique when applied on more complex problems and against problems with continuous action spaces is yet to be shown. In this paper we adapt the innovations proposed by Elastic Step DQN onto the DDPG algorithm and show empirically that Elastic Step DDPG is able to achieve a much stronger final training policy and is more sample efficient than DDPG. © 2023 IEEE.

1

‹ › ×