List of Titles

On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts

- Vamplew, Peter, Yearwood, John, Dazeley, Richard, Berry, Adam

Authors: Vamplew, Peter , Yearwood, John , Dazeley, Richard , Berry, Adam
Date: 2008
Type: Text , Conference paper
Relation: Paper presented at 21st Australasian Joint Conference on Artificial Intelligence, Auckland, New Zealand : 1st-5th December 2008 Vol. 5360, p. 372-378
Full Text: false
Description: Multiobjective reinforcement learning (MORL) extends RL to problems with multiple conflicting objectives. This paper argues for designing MORL systems to produce a set of solutions approximating the Pareto front, and shows that the common MORL technique of scalarisation has fundamental limitations when used to find Pareto-optimal policies. The work is supported by the presentation of three new MORL benchmarks with known Pareto fronts.
Description: 2003006504

Coarse Q-Learning : Addressing the convergence problem when quantizing continuous state variables

- Dazeley, Richard, Vamplew, Peter, Bignold, Adam

Authors: Dazeley, Richard , Vamplew, Peter , Bignold, Adam
Date: 2015
Type: Text , Conference paper
Relation: 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making
Full Text: false
Reviewed:
Description: Value-based approaches to reinforcement learning (RL) maintain a value function that measures the long term utility of a state or state-action pair. A long standing issue in RL is how to create a finite representation in a continuous, and therefore infinite, state environment. The common approach is to use function approximators such as tile coding, memory or instance based methods. These provide some balance between generalisation, resolution, and storage, but converge slowly in multidimensional state environments. Another approach of quantizing state into lookup tables has been commonly regarded as highly problematic, due to large memory requirements and poor generalisation. In particular , attempting to reduce memory requirements and increase generalisation by using coarser quantization forms a non-Markovian system that does not converge. This paper investigates the problem in using quantized lookup tables and presents an extension to the Q-Learning algorithm, referred to as Coarse Q-Learning (C QL), which resolves these issues. The presented algorithm will be shown to drastically reduce the memory requirements and increase generalisation by simulating the Markov property. In particular, this algorithm means the size of the input space is determined by the granularity required by the policy being learnt, rather than by the inadequacies of the learning algorithm or the nature of the state-reward dynamics of the environment. Importantly, the method presented solves the problem represented by the curse of dimensionality.

Adaptive Dynamic Programming based Control Scheme for Uncertain Two-Wheel Robots

- Van Nguyen, Thien, Le, Hai, Tran, Hoang, Nguyen, Duc, Nguyen, Minh, Nguyen, Linh

Authors: Van Nguyen, Thien , Le, Hai , Tran, Hoang , Nguyen, Duc , Nguyen, Minh , Nguyen, Linh
Date: 2021
Type: Text , Conference paper
Relation: 2021 IEEE International Conference on Autonomous Robot Systems and Competitions, ICARSC 2021, 28 April 2021 through 29 April 2021 p. 111-116
Full Text: false
Reviewed:
Description: The paper addresses the problem of effectively controlling a two-wheel robot given its inherent non-linearity and parameter uncertainties. In order to deal with the unknown and uncertain dynamics of the robot, it is proposed to employ the adaptive dynamic programming, a reinforcement learning based technique, to develop an optimal control law. It is interesting that the proposed algorithm does not require kinematic parameters while finding the optimal state controller is guaranteed. Moreover, convergence of the optimal control scheme is theoretically proved. The proposed approach was implemented in a synthetic two-wheel robot where the obtained results demonstrate its effectiveness. © 2021 IEEE.

Scalar reward is not enough JAAMAS Track

- Vamplew, Peter, Smith, Benjamin, Källström, Johan, Ramos, Gabriel, Rădulescu, Roxana, Roijers, Diederik, Hayes, Conor, Heintz, Frederik, Mannion, Patrick, Libin, Pieter, Dazeley, Richard, Foale, Cameron

Authors: Vamplew, Peter , Smith, Benjamin , Källström, Johan , Ramos, Gabriel , Rădulescu, Roxana , Roijers, Diederik , Hayes, Conor , Heintz, Frederik , Mannion, Patrick , Libin, Pieter , Dazeley, Richard , Foale, Cameron
Date: 2023
Type: Text , Conference paper
Relation: 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023, London, 29 May to 2 June 2023, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, Vol. 2023-May, p. 839-841
Full Text: false
Reviewed:
Description: Silver et al. [14] posit that scalar reward maximisation is sufficient to underpin all intelligence and provides a suitable basis for artificial general intelligence (AGI). This extended abstract summarises the counter-argument from our JAAMAS paper[19]. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.

Elastic step DDPG : multi-step reinforcement learning for improved sample efficiency

- Ly, Adrian, Dazeley, Richard, Vamplew, Peter, Cruz, Francisco, Aryal, Sunil

Authors: Ly, Adrian , Dazeley, Richard , Vamplew, Peter , Cruz, Francisco , Aryal, Sunil
Date: 2023
Type: Text , Conference paper
Relation: 2023 International Joint Conference on Neural Networks, IJCNN 2023 Vol. 2023-June
Full Text: false
Reviewed:
Description: A major challenge in deep reinforcement learning is that it requires more data to converge to an policy for complex problems. One way to improve sample efficiency is to use n-step updates to reduce the number of samples required to converge to a good policy. However n-step updates are known to be brittle and difficult to tune. Elastic Step DQN has shown that it is possible to automate the value of n in DQN to solve problems involving discrete action spaces, however the efficacy of the technique when applied on more complex problems and against problems with continuous action spaces is yet to be shown. In this paper we adapt the innovations proposed by Elastic Step DQN onto the DDPG algorithm and show empirically that Elastic Step DDPG is able to achieve a much stronger final training policy and is more sample efficient than DDPG. © 2023 IEEE.

Showing items 1 - 5 of 5