Coarse Q-Learning : Addressing the convergence problem when quantizing continuous state variables
- Dazeley, Richard, Vamplew, Peter, Bignold, Adam
- Authors: Dazeley, Richard , Vamplew, Peter , Bignold, Adam
- Date: 2015
- Type: Text , Conference paper
- Relation: 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making
- Full Text: false
- Reviewed:
- Description: Value-based approaches to reinforcement learning (RL) maintain a value function that measures the long term utility of a state or state-action pair. A long standing issue in RL is how to create a finite representation in a continuous, and therefore infinite, state environment. The common approach is to use function approximators such as tile coding, memory or instance based methods. These provide some balance between generalisation, resolution, and storage, but converge slowly in multidimensional state environments. Another approach of quantizing state into lookup tables has been commonly regarded as highly problematic, due to large memory requirements and poor generalisation. In particular , attempting to reduce memory requirements and increase generalisation by using coarser quantization forms a non-Markovian system that does not converge. This paper investigates the problem in using quantized lookup tables and presents an extension to the Q-Learning algorithm, referred to as Coarse Q-Learning (C QL), which resolves these issues. The presented algorithm will be shown to drastically reduce the memory requirements and increase generalisation by simulating the Markov property. In particular, this algorithm means the size of the input space is determined by the granularity required by the policy being learnt, rather than by the inadequacies of the learning algorithm or the nature of the state-reward dynamics of the environment. Importantly, the method presented solves the problem represented by the curse of dimensionality.
Rule-based interactive assisted reinforcement learning
- Authors: Bignold, Adam
- Date: 2019
- Type: Text , Thesis , PhD
- Full Text:
- Description: Reinforcement Learning (RL) has seen increasing interest over the past few years, partially owing to breakthroughs in the digestion and application of external information. The use of external information results in improved learning speeds and solutions to more complex domains. This thesis, a collection of five key contributions, demonstrates that comparable performance gains to existing Interactive Reinforcement Learning methods can be achieved using less data, sourced during operation, and without prior verifcation and validation of the information's integrity. First, this thesis introduces Assisted Reinforcement Learning (ARL), a collective term referring to RL methods that utilise external information to leverage the learning process, and provides a non-exhaustive review of current ARL methods. Second, two advice delivery methods common in ARL, evaluative and informative, are compared through human trials. The comparison highlights how human engagement, accuracy of advice, agent performance, and advice utility differ between the two methods. Third, this thesis introduces simulated users as a methodology for testing and comparing ARL methods. Simulated users enable testing and comparing of ARL systems without costly and time-consuming human trials. While not a replacement for well-designed human trials, simulated users offer a cheap and robust approach to ARL design and comparison. Fourth, the concept of persistence is introduced to Interactive Reinforcement Learning. The retention and reuse of advice maximises utility and can lead to improved performance and reduced human demand. Finally, this thesis presents rule-based interactive RL, an iterative method for providing advice to an agent. Existing interactive RL methods rely on constant human supervision and evaluation, requiring a substantial commitment from the advice-giver. Rule-based advice can be provided proactively and be generalised over the state-space while remaining flexible enough to handle potentially inaccurate or irrelevant information. Ultimately, the thesis contributions are validated empirically and clearly show that rule-based advice signicantly reduces human guidance requirements while improving agent performance.
- Description: Doctor of Pholosophy
- Authors: Bignold, Adam
- Date: 2019
- Type: Text , Thesis , PhD
- Full Text:
- Description: Reinforcement Learning (RL) has seen increasing interest over the past few years, partially owing to breakthroughs in the digestion and application of external information. The use of external information results in improved learning speeds and solutions to more complex domains. This thesis, a collection of five key contributions, demonstrates that comparable performance gains to existing Interactive Reinforcement Learning methods can be achieved using less data, sourced during operation, and without prior verifcation and validation of the information's integrity. First, this thesis introduces Assisted Reinforcement Learning (ARL), a collective term referring to RL methods that utilise external information to leverage the learning process, and provides a non-exhaustive review of current ARL methods. Second, two advice delivery methods common in ARL, evaluative and informative, are compared through human trials. The comparison highlights how human engagement, accuracy of advice, agent performance, and advice utility differ between the two methods. Third, this thesis introduces simulated users as a methodology for testing and comparing ARL methods. Simulated users enable testing and comparing of ARL systems without costly and time-consuming human trials. While not a replacement for well-designed human trials, simulated users offer a cheap and robust approach to ARL design and comparison. Fourth, the concept of persistence is introduced to Interactive Reinforcement Learning. The retention and reuse of advice maximises utility and can lead to improved performance and reduced human demand. Finally, this thesis presents rule-based interactive RL, an iterative method for providing advice to an agent. Existing interactive RL methods rely on constant human supervision and evaluation, requiring a substantial commitment from the advice-giver. Rule-based advice can be provided proactively and be generalised over the state-space while remaining flexible enough to handle potentially inaccurate or irrelevant information. Ultimately, the thesis contributions are validated empirically and clearly show that rule-based advice signicantly reduces human guidance requirements while improving agent performance.
- Description: Doctor of Pholosophy
An evaluation methodology for interactive reinforcement learning with simulated users
- Bignold, Adam, Cruz, Francisco, Dazeley, Richard, Vamplew, Peter, Foale, Cameron
- Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2021
- Type: Text , Journal article
- Relation: Biomimetics Vol. 6, no. 1 (2021), p. 1-15
- Full Text:
- Reviewed:
- Description: Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
- Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2021
- Type: Text , Journal article
- Relation: Biomimetics Vol. 6, no. 1 (2021), p. 1-15
- Full Text:
- Reviewed:
- Description: Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
- Klein, Britt, Clinnick, Lisa, Chesler, Jessica, Stranieri, Andrew, Bignold, Adam, Dazeley, Richard, McLaren, Suzanne, Lauder, Sue, Balasubramanian, Venki
- Authors: Klein, Britt , Clinnick, Lisa , Chesler, Jessica , Stranieri, Andrew , Bignold, Adam , Dazeley, Richard , McLaren, Suzanne , Lauder, Sue , Balasubramanian, Venki
- Date: 2018
- Type: Text , Conference paper
- Relation: 2017 Global Telehealth Meeting, GT 201; Adelaide, Australia; 22nd-24th November 2017; published in Telehealth for our Ageing Society (part of the Studies in Health Technology and Informatics series) Vol. 246, p. 24-28
- Full Text: false
- Reviewed:
- Description: Background: This regional pilot site ‘end-user attitudes’ study explored nurses’ experiences and impressions of using the Nurses’ Behavioural Assistant (NBA) (a knowledge-based, interactive ehealth system) to assist them to better respond to behavioural and psychological symptoms of dementia (BPSD) and will be reported here. Methods: Focus groups were conducted, followed by a four-week pilot site ‘end-user attitudes’ trial of the NBA at a regional aged care residential facility (ACRF). Brief interviews were conducted with consenting nursing staff. Results: Focus group feedback (N = 10) required only minor cosmetic changes to the NBA prototype. Post pilot site end-user interview data (N = 10) indicated that the regional ACRF nurses were positive and enthusiastic about the NBA, however several issues were also identified. Conclusions: Overall the results supported the utility of the NBA to promote a person centred care approach to managing BPSD. Slight modifications may be required to maximise its uptake across all ACRF nursing staff.
Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety
- Vamplew, Peter, Foale, Cameron, Dazeley, Richard, Bignold, Adam
- Authors: Vamplew, Peter , Foale, Cameron , Dazeley, Richard , Bignold, Adam
- Date: 2021
- Type: Text , Journal article
- Relation: Engineering Applications of Artificial Intelligence Vol. 100, no. (2021), p.
- Full Text:
- Reviewed:
- Description: The concept of impact-minimisation has previously been proposed as an approach to addressing the safety concerns that can arise from utility-maximising agents. An impact-minimising agent takes into account the potential impact of its actions on the state of the environment when selecting actions, so as to avoid unacceptable side-effects. This paper proposes and empirically evaluates an implementation of impact-minimisation within the framework of multiobjective reinforcement learning. The key contributions are a novel potential-based approach to specifying a measure of impact, and an examination of a variety of non-linear action-selection operators so as to achieve an acceptable trade-off between achieving the agent's primary task and minimising environmental impact. These experiments also highlight a previously unreported issue with noisy estimates for multiobjective agents using non-linear action-selection, which has broader implications for the application of multiobjective reinforcement learning. © 2021
- Authors: Vamplew, Peter , Foale, Cameron , Dazeley, Richard , Bignold, Adam
- Date: 2021
- Type: Text , Journal article
- Relation: Engineering Applications of Artificial Intelligence Vol. 100, no. (2021), p.
- Full Text:
- Reviewed:
- Description: The concept of impact-minimisation has previously been proposed as an approach to addressing the safety concerns that can arise from utility-maximising agents. An impact-minimising agent takes into account the potential impact of its actions on the state of the environment when selecting actions, so as to avoid unacceptable side-effects. This paper proposes and empirically evaluates an implementation of impact-minimisation within the framework of multiobjective reinforcement learning. The key contributions are a novel potential-based approach to specifying a measure of impact, and an examination of a variety of non-linear action-selection operators so as to achieve an acceptable trade-off between achieving the agent's primary task and minimising environmental impact. These experiments also highlight a previously unreported issue with noisy estimates for multiobjective agents using non-linear action-selection, which has broader implications for the application of multiobjective reinforcement learning. © 2021
A conceptual framework for externally-influenced agents: an assisted reinforcement learning review
- Bignold, Adam, Cruz, Francisco, Taylor, Matthew, Brys, Tim, Dazeley, Richard, Vamplew, Peter, Foale, Cameron
- Authors: Bignold, Adam , Cruz, Francisco , Taylor, Matthew , Brys, Tim , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Ambient Intelligence and Humanized Computing Vol. 14, no. 4 (2023), p. 3621-3644
- Full Text:
- Reviewed:
- Description: A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent’s performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
- Authors: Bignold, Adam , Cruz, Francisco , Taylor, Matthew , Brys, Tim , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Ambient Intelligence and Humanized Computing Vol. 14, no. 4 (2023), p. 3621-3644
- Full Text:
- Reviewed:
- Description: A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent’s performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
Human engagement providing evaluative and informative advice for interactive reinforcement learning
- Bignold, Adam, Cruz, Francisco, Dazeley, Richard, Vamplew, Peter, Foale, Cameron
- Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2023
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 35, no. 25 (2023), p. 18215-18230
- Full Text:
- Reviewed:
- Description: Interactive reinforcement learning proposes the use of externally sourced information in order to speed up the learning process. When interacting with a learner agent, humans may provide either evaluative or informative advice. Prior research has focused on the effect of human-sourced advice by including real-time feedback on the interactive reinforcement learning process, specifically aiming to improve the learning speed of the agent, while minimising the time demands on the human. This work focuses on answering which of two approaches, evaluative or informative, is the preferred instructional approach for humans. Moreover, this work presents an experimental setup for a human trial designed to compare the methods people use to deliver advice in terms of human engagement. The results obtained show that users giving informative advice to the learner agents provide more accurate advice, are willing to assist the learner agent for a longer time, and provide more advice per episode. Additionally, self-evaluation from participants using the informative approach has indicated that the agent’s ability to follow the advice is higher, and therefore, they feel their own advice to be of higher accuracy when compared to people providing evaluative advice. © 2022, The Author(s).
- Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2023
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 35, no. 25 (2023), p. 18215-18230
- Full Text:
- Reviewed:
- Description: Interactive reinforcement learning proposes the use of externally sourced information in order to speed up the learning process. When interacting with a learner agent, humans may provide either evaluative or informative advice. Prior research has focused on the effect of human-sourced advice by including real-time feedback on the interactive reinforcement learning process, specifically aiming to improve the learning speed of the agent, while minimising the time demands on the human. This work focuses on answering which of two approaches, evaluative or informative, is the preferred instructional approach for humans. Moreover, this work presents an experimental setup for a human trial designed to compare the methods people use to deliver advice in terms of human engagement. The results obtained show that users giving informative advice to the learner agents provide more accurate advice, are willing to assist the learner agent for a longer time, and provide more advice per episode. Additionally, self-evaluation from participants using the informative approach has indicated that the agent’s ability to follow the advice is higher, and therefore, they feel their own advice to be of higher accuracy when compared to people providing evaluative advice. © 2022, The Author(s).
- «
- ‹
- 1
- ›
- »