Portal-based sound propagation for first-person computer games
- Foale, Cameron, Vamplew, Peter
- Authors: Foale, Cameron , Vamplew, Peter
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Fourth Australiasian Conference on Interactive Entertainment, IE2007, RMIT University, Melbourne, Victoria : 3rd-5th December 2007
- Full Text:
- Description: First-person computer games are a popular modern video game genre. A new method is proposed, the Directional Propagation Cache, that takes adavntage of the very common portal spatial subdivision method to accelerate environmental acoustics simulation for first-person games, by caching sound propagation information between portals.
- Description: 2003004700
- Authors: Foale, Cameron , Vamplew, Peter
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Fourth Australiasian Conference on Interactive Entertainment, IE2007, RMIT University, Melbourne, Victoria : 3rd-5th December 2007
- Full Text:
- Description: First-person computer games are a popular modern video game genre. A new method is proposed, the Directional Propagation Cache, that takes adavntage of the very common portal spatial subdivision method to accelerate environmental acoustics simulation for first-person games, by caching sound propagation information between portals.
- Description: 2003004700
Reinforcement learning of pareto-optimal multiobjective policies using steering
- Vamplew, Peter, Issabekov, Rustam, Dazeley, Richard, Foale, Cameron
- Authors: Vamplew, Peter , Issabekov, Rustam , Dazeley, Richard , Foale, Cameron
- Date: 2015
- Type: Text , Conference paper
- Relation: 28th Australasian Joint Conference on Artificial Intelligence, AI 2015; Canberra, ACT; 30th November-4th December 2015 Vol. 9457, p. 596-608
- Full Text: false
- Reviewed:
- Description: There has been little research into multiobjective reinforcement learning (MORL) algorithms using stochastic or non-stationary policies, even though such policies may Pareto-dominate deterministic stationary policies. One approach is steering which forms a nonstationary combination of deterministic stationary base policies. This paper presents two new steering algorithms designed for the task of learning Pareto-optimal policies. The first algorithm (w-steering) is a direct adaptation of previous approaches to steering, and therefore requires prior knowledge of recurrent states which are guaranteed to be revisited. The second algorithm (Q-steering) eliminates this requirement. Empirical results show that both algorithms perform well when given knowledge of recurrent states, but that Q-steering provides substantial performance improvements over w-steering when this knowledge is not available. © Springer International Publishing Switzerland 2015.
Softmax exploration strategies for multiobjective reinforcement learning
- Vamplew, Peter, Dazeley, Richard, Foale, Cameron
- Authors: Vamplew, Peter , Dazeley, Richard , Foale, Cameron
- Date: 2017
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 263, no. (2017), p. 74-86
- Full Text:
- Reviewed:
- Description: Despite growing interest over recent years in applying reinforcement learning to multiobjective problems, there has been little research into the applicability and effectiveness of exploration strategies within the multiobjective context. This work considers several widely-used approaches to exploration from the single-objective reinforcement learning literature, and examines their incorporation into multiobjective Q-learning. In particular this paper proposes two novel approaches which extend the softmax operator to work with vector-valued rewards. The performance of these exploration strategies is evaluated across a set of benchmark environments. Issues arising from the multiobjective formulation of these benchmarks which impact on the performance of the exploration strategies are identified. It is shown that of the techniques considered, the combination of the novel softmax–epsilon exploration with optimistic initialisation provides the most effective trade-off between exploration and exploitation.
- Authors: Vamplew, Peter , Dazeley, Richard , Foale, Cameron
- Date: 2017
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 263, no. (2017), p. 74-86
- Full Text:
- Reviewed:
- Description: Despite growing interest over recent years in applying reinforcement learning to multiobjective problems, there has been little research into the applicability and effectiveness of exploration strategies within the multiobjective context. This work considers several widely-used approaches to exploration from the single-objective reinforcement learning literature, and examines their incorporation into multiobjective Q-learning. In particular this paper proposes two novel approaches which extend the softmax operator to work with vector-valued rewards. The performance of these exploration strategies is evaluated across a set of benchmark environments. Issues arising from the multiobjective formulation of these benchmarks which impact on the performance of the exploration strategies are identified. It is shown that of the techniques considered, the combination of the novel softmax–epsilon exploration with optimistic initialisation provides the most effective trade-off between exploration and exploitation.
Steering approaches to Pareto-optimal multiobjective reinforcement learning
- Vamplew, Peter, Issabekov, Rustam, Dazeley, Richard, Foale, Cameron, Berry, Adam, Moore, Tim, Creighton, Douglas
- Authors: Vamplew, Peter , Issabekov, Rustam , Dazeley, Richard , Foale, Cameron , Berry, Adam , Moore, Tim , Creighton, Douglas
- Date: 2017
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 263, no. (2017), p. 26-38
- Full Text:
- Reviewed:
- Description: For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent’s target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system.
- Authors: Vamplew, Peter , Issabekov, Rustam , Dazeley, Richard , Foale, Cameron , Berry, Adam , Moore, Tim , Creighton, Douglas
- Date: 2017
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 263, no. (2017), p. 26-38
- Full Text:
- Reviewed:
- Description: For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent’s target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system.
Human-aligned artificial intelligence is a multiobjective problem
- Vamplew, Peter, Dazeley, Richard, Foale, Cameron, Firmin, Sally, Mummery, Jane
- Authors: Vamplew, Peter , Dazeley, Richard , Foale, Cameron , Firmin, Sally , Mummery, Jane
- Date: 2018
- Type: Text , Journal article
- Relation: Ethics and Information Technology Vol. 20, no. 1 (2018), p. 27-40
- Full Text:
- Reviewed:
- Description: As the capabilities of artificial intelligence (AI) systems improve, it becomes important to constrain their actions to ensure their behaviour remains beneficial to humanity. A variety of ethical, legal and safety-based frameworks have been proposed as a basis for designing these constraints. Despite their variations, these frameworks share the common characteristic that decision-making must consider multiple potentially conflicting factors. We demonstrate that these alignment frameworks can be represented as utility functions, but that the widely used Maximum Expected Utility (MEU) paradigm provides insufficient support for such multiobjective decision-making. We show that a Multiobjective Maximum Expected Utility paradigm based on the combination of vector utilities and non-linear action–selection can overcome many of the issues which limit MEU’s effectiveness in implementing aligned AI. We examine existing approaches to multiobjective AI, and identify how these can contribute to the development of human-aligned intelligent agents. © 2017, Springer Science+Business Media B.V.
- Authors: Vamplew, Peter , Dazeley, Richard , Foale, Cameron , Firmin, Sally , Mummery, Jane
- Date: 2018
- Type: Text , Journal article
- Relation: Ethics and Information Technology Vol. 20, no. 1 (2018), p. 27-40
- Full Text:
- Reviewed:
- Description: As the capabilities of artificial intelligence (AI) systems improve, it becomes important to constrain their actions to ensure their behaviour remains beneficial to humanity. A variety of ethical, legal and safety-based frameworks have been proposed as a basis for designing these constraints. Despite their variations, these frameworks share the common characteristic that decision-making must consider multiple potentially conflicting factors. We demonstrate that these alignment frameworks can be represented as utility functions, but that the widely used Maximum Expected Utility (MEU) paradigm provides insufficient support for such multiobjective decision-making. We show that a Multiobjective Maximum Expected Utility paradigm based on the combination of vector utilities and non-linear action–selection can overcome many of the issues which limit MEU’s effectiveness in implementing aligned AI. We examine existing approaches to multiobjective AI, and identify how these can contribute to the development of human-aligned intelligent agents. © 2017, Springer Science+Business Media B.V.
Modeling neurocognitive reaction time with gamma distribution
- Santhanagopalan, Meena, Chetty, Madhu, Foale, Cameron, Aryal, Sunil, Klein, Britt
- Authors: Santhanagopalan, Meena , Chetty, Madhu , Foale, Cameron , Aryal, Sunil , Klein, Britt
- Date: 2018
- Type: Text , Conference proceedings
- Relation: ACSW'18 . Proceedings of the Australasian Computer Science Week Multiconference; Brisbane, QLD; January 2018; Article 28 p. 1-10
- Full Text: false
- Reviewed:
- Description: As a broader effort to build a holistic biopsychosocial health metric, reaction time data obtained from participants undertaking neurocognitive tests have been examined using Exploratory Data Analysis (EDA) for assessing its distribution. Many of the known existing methods assume, that the reaction time data follows a Gaussian distribution and thus commonly use statistical measures such as Analysis of Variance (ANOVA) for analysis. However, it is not mandatory for the reaction time data, to necessarily follow Gaussian distribution and in many instances, it can be better modeled by other representations such as Gamma distribution. Unlike Gaussian distribution which is defined using mean and variance, the Gamma distribution is defined using shape and scale parameters which also considers higher order moments of data such as skewness and kurtosis. Generalized Linear Models (GLM), based on the family exponential distributions such as Gamma distribution, which have been used to model reaction time in other domains, have not been fully explored for modeling reaction time data in psychology domain. While limited use of Gamma distribution have been reported [5, 17, 21], for analyzing response times, their application has been somewhat ad-hoc rather than systematic. For this proposed research, we use a real life biopsychosocial dataset, generated from the 'digital health' intervention programs conducted by the Faculty of Health, Federation University, Australia. The two digital intervention programs were the 'Mindfulness' program and 'Physical Activity' program. The neurocognitive tests were carried out as part of the 'Mindfulness' program. In this paper, we investigate the participants' reaction time distributions in neurocognitive tests such as the Psychology Experiment Building Language (PEBL) Go/No-Go test [19], which is a subset of the larger biopsychosocial data set. PEBL is an open source software system for designing and running psychological experiments. Analysis of participants' reaction time in the PEBL Go/No-Go test, shows that the reaction time data are more compatible with a Gamma distribution and clearly demonstrate that these can be better modeled by Gamma distribution.
Non-functional regression : A new challenge for neural networks
- Vamplew, Peter, Dazeley, Richard, Foale, Cameron, Choudhury, Tanveer
- Authors: Vamplew, Peter , Dazeley, Richard , Foale, Cameron , Choudhury, Tanveer
- Date: 2018
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 314, no. (2018), p. 326-335
- Full Text:
- Reviewed:
- Description: This work identifies an important, previously unaddressed issue for regression based on neural networks – learning to accurately approximate problems where the output is not a function of the input (i.e. where the number of outputs required varies across input space). Such non-functional regression problems arise in a number of applications, and can not be adequately handled by existing neural network algorithms. To demonstrate the benefits possible from directly addressing non-functional regression, this paper proposes the first neural algorithm to do so – an extension of the Resource Allocating Network (RAN) which adds additional output neurons to the network structure during training. This new algorithm, called the Resource Allocating Network with Varying Output Cardinality (RANVOC), is demonstrated to be capable of learning to perform non-functional regression, on both artificially constructed data and also on the real-world task of specifying parameter settings for a plasma-spray process. Importantly RANVOC is shown to outperform not just the original RAN algorithm, but also the best possible error rates achievable by any functional form of regression.
- Authors: Vamplew, Peter , Dazeley, Richard , Foale, Cameron , Choudhury, Tanveer
- Date: 2018
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 314, no. (2018), p. 326-335
- Full Text:
- Reviewed:
- Description: This work identifies an important, previously unaddressed issue for regression based on neural networks – learning to accurately approximate problems where the output is not a function of the input (i.e. where the number of outputs required varies across input space). Such non-functional regression problems arise in a number of applications, and can not be adequately handled by existing neural network algorithms. To demonstrate the benefits possible from directly addressing non-functional regression, this paper proposes the first neural algorithm to do so – an extension of the Resource Allocating Network (RAN) which adds additional output neurons to the network structure during training. This new algorithm, called the Resource Allocating Network with Varying Output Cardinality (RANVOC), is demonstrated to be capable of learning to perform non-functional regression, on both artificially constructed data and also on the real-world task of specifying parameter settings for a plasma-spray process. Importantly RANVOC is shown to outperform not just the original RAN algorithm, but also the best possible error rates achievable by any functional form of regression.
Relevance of frequency of heart-rate peaks as indicator of ‘Biological’ Stress level
- Santhanagopalan, Meena, Chetty, Madhu, Foale, Cameron, Aryal, Sunil, Klein, Britt
- Authors: Santhanagopalan, Meena , Chetty, Madhu , Foale, Cameron , Aryal, Sunil , Klein, Britt
- Date: 2018
- Type: Text , Conference proceedings
- Relation: ICONIP 2018 International on Neural Information Processing; Siem Reap, Cambodia; 13th-16th December, 2018 p. 598-609
- Full Text: false
- Reviewed:
- Description: The biopsychosocial (BPS) model proposes that health is best understood as a combination of bio-physiological, psychological and social determinants, and thus advocates for a far more comprehensive investigation of the relationships between ‘mind-body’ health. For this holistic analysis, we need a suitable measure to indicate participants’ ‘biological’ stress. With the advent of wearable sensor devices, health monitoring is becoming easier. In this study, we focus on bio-physiological indicators of stress, from wearable devices using the heart-rate data. The analysis of such heart-rate data presents a set of practical challenges. We review various measures currently in use for stress measurement and their relevance and significance with the wearables’ heart-rate data. In this paper, we propose to use the novel ‘peak heart-rate count’ metric to quantify level of ‘biological’ stress. Real life biometric data obtained from digital health intervention program was considered for the study. Our study indicates the significance of using frequency of ‘peak heart-rate count’ as a ‘biological’ stress measure.
Towards machine learning approach for digital-health intervention program
- Santhanagopalan, Meena, Chetty, Madhu, Foale, Cameron, Klein, Britt
- Authors: Santhanagopalan, Meena , Chetty, Madhu , Foale, Cameron , Klein, Britt
- Date: 2019
- Type: Text , Journal article
- Relation: Australian Journal of Intelligent Information Processing System Vol. 15, no. 4 (2019), p. 16-24
- Full Text:
- Reviewed:
- Description: Digital-Health intervention (DHI) are used by health care providers to promote engagement within community. Effective assignment of participants into DHI programs helps increasing benefits from the most suitable intervention. A major challenge with the roll-out and implementation of DHI, is in assigning participants into different interventions. The use of biopsychosocial model [18] for this purpose is not wide spread, due to limited personalized interventions formed on evidence-based data-driven models. Machine learning has changed the way data extraction and interpretation works by involving automatic sets of generic methods that have replaced the traditional statistical techniques. In this paper, we propose to investigate relevance of machine learning for this purpose and is carried out by studying different non-linear classifiers and compare their prediction accuracy to evaluate their suitability. Further, as a novel contribution, real-life biopsychosocial features are used as input in this study. The results help in developing an appropriate predictive classication model to assign participants into the most suitable DHI. We analyze biopsychosocial data generated from a DHI program and study their feature characteristics using scatter plots. While scatter plots are unable to reveal the linear relationships in the data-set, the use of classifiers can successfully identify which features are suitable predictors of mental ill health.
- Authors: Santhanagopalan, Meena , Chetty, Madhu , Foale, Cameron , Klein, Britt
- Date: 2019
- Type: Text , Journal article
- Relation: Australian Journal of Intelligent Information Processing System Vol. 15, no. 4 (2019), p. 16-24
- Full Text:
- Reviewed:
- Description: Digital-Health intervention (DHI) are used by health care providers to promote engagement within community. Effective assignment of participants into DHI programs helps increasing benefits from the most suitable intervention. A major challenge with the roll-out and implementation of DHI, is in assigning participants into different interventions. The use of biopsychosocial model [18] for this purpose is not wide spread, due to limited personalized interventions formed on evidence-based data-driven models. Machine learning has changed the way data extraction and interpretation works by involving automatic sets of generic methods that have replaced the traditional statistical techniques. In this paper, we propose to investigate relevance of machine learning for this purpose and is carried out by studying different non-linear classifiers and compare their prediction accuracy to evaluate their suitability. Further, as a novel contribution, real-life biopsychosocial features are used as input in this study. The results help in developing an appropriate predictive classication model to assign participants into the most suitable DHI. We analyze biopsychosocial data generated from a DHI program and study their feature characteristics using scatter plots. While scatter plots are unable to reveal the linear relationships in the data-set, the use of classifiers can successfully identify which features are suitable predictors of mental ill health.
An evaluation methodology for interactive reinforcement learning with simulated users
- Bignold, Adam, Cruz, Francisco, Dazeley, Richard, Vamplew, Peter, Foale, Cameron
- Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2021
- Type: Text , Journal article
- Relation: Biomimetics Vol. 6, no. 1 (2021), p. 1-15
- Full Text:
- Reviewed:
- Description: Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
- Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2021
- Type: Text , Journal article
- Relation: Biomimetics Vol. 6, no. 1 (2021), p. 1-15
- Full Text:
- Reviewed:
- Description: Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
Language representations for generalization in reinforcement learning
- Goodger, Nikolaj, Vamplew, Peter, Foale, Cameron, Dazeley, Richard
- Authors: Goodger, Nikolaj , Vamplew, Peter , Foale, Cameron , Dazeley, Richard
- Date: 2021
- Type: Text , Conference paper
- Relation: 13th Asian Conference on Machine Learning, Virtual, 17-19 November 2021, Proceedings of The 13th Asian Conference on Machine Learning Vol. 157, p. 390-405
- Full Text:
- Reviewed:
- Description: The choice of state and action representation in Reinforcement Learning (RL) has a significant effect on agent performance for the training task. But its relationship with generalization to new tasks is under-explored. One approach to improving generalization investigated here is the use of language as a representation. We compare vector-states and discreteactions to language representations. We find the agents using language representations generalize better and could solve tasks with more entities, new entities, and more complexity than seen in the training task. We attribute this to the compositionality of language
- Authors: Goodger, Nikolaj , Vamplew, Peter , Foale, Cameron , Dazeley, Richard
- Date: 2021
- Type: Text , Conference paper
- Relation: 13th Asian Conference on Machine Learning, Virtual, 17-19 November 2021, Proceedings of The 13th Asian Conference on Machine Learning Vol. 157, p. 390-405
- Full Text:
- Reviewed:
- Description: The choice of state and action representation in Reinforcement Learning (RL) has a significant effect on agent performance for the training task. But its relationship with generalization to new tasks is under-explored. One approach to improving generalization investigated here is the use of language as a representation. We compare vector-states and discreteactions to language representations. We find the agents using language representations generalize better and could solve tasks with more entities, new entities, and more complexity than seen in the training task. We attribute this to the compositionality of language
Levels of explainable artificial intelligence for human-aligned conversational explanations
- Dazeley, Richard, Vamplew, Peter, Foale, Cameron, Young, Cameron, Aryal, Sunil, Cruz, Francisco
- Authors: Dazeley, Richard , Vamplew, Peter , Foale, Cameron , Young, Cameron , Aryal, Sunil , Cruz, Francisco
- Date: 2021
- Type: Text , Journal article
- Relation: Artificial Intelligence Vol. 299, no. (2021), p.
- Full Text:
- Reviewed:
- Description: Over the last few years there has been rapid research growth into eXplainable Artificial Intelligence (XAI) and the closely aligned Interpretable Machine Learning (IML). Drivers for this growth include recent legislative changes and increased investments by industry and governments, along with increased concern from the general public. People are affected by autonomous decisions every day and the public need to understand the decision-making process to accept the outcomes. However, the vast majority of the applications of XAI/IML are focused on providing low-level ‘narrow’ explanations of how an individual decision was reached based on a particular datum. While important, these explanations rarely provide insights into an agent's: beliefs and motivations; hypotheses of other (human, animal or AI) agents' intentions; interpretation of external cultural expectations; or, processes used to generate its own explanation. Yet all of these factors, we propose, are essential to providing the explanatory depth that people require to accept and trust the AI's decision-making. This paper aims to define levels of explanation and describe how they can be integrated to create a human-aligned conversational explanation system. In so doing, this paper will survey current approaches and discuss the integration of different technologies to achieve these levels with Broad eXplainable Artificial Intelligence (Broad-XAI), and thereby move towards high-level ‘strong’ explanations. © 2021 Elsevier B.V.
- Authors: Dazeley, Richard , Vamplew, Peter , Foale, Cameron , Young, Cameron , Aryal, Sunil , Cruz, Francisco
- Date: 2021
- Type: Text , Journal article
- Relation: Artificial Intelligence Vol. 299, no. (2021), p.
- Full Text:
- Reviewed:
- Description: Over the last few years there has been rapid research growth into eXplainable Artificial Intelligence (XAI) and the closely aligned Interpretable Machine Learning (IML). Drivers for this growth include recent legislative changes and increased investments by industry and governments, along with increased concern from the general public. People are affected by autonomous decisions every day and the public need to understand the decision-making process to accept the outcomes. However, the vast majority of the applications of XAI/IML are focused on providing low-level ‘narrow’ explanations of how an individual decision was reached based on a particular datum. While important, these explanations rarely provide insights into an agent's: beliefs and motivations; hypotheses of other (human, animal or AI) agents' intentions; interpretation of external cultural expectations; or, processes used to generate its own explanation. Yet all of these factors, we propose, are essential to providing the explanatory depth that people require to accept and trust the AI's decision-making. This paper aims to define levels of explanation and describe how they can be integrated to create a human-aligned conversational explanation system. In so doing, this paper will survey current approaches and discuss the integration of different technologies to achieve these levels with Broad eXplainable Artificial Intelligence (Broad-XAI), and thereby move towards high-level ‘strong’ explanations. © 2021 Elsevier B.V.
Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety
- Vamplew, Peter, Foale, Cameron, Dazeley, Richard, Bignold, Adam
- Authors: Vamplew, Peter , Foale, Cameron , Dazeley, Richard , Bignold, Adam
- Date: 2021
- Type: Text , Journal article
- Relation: Engineering Applications of Artificial Intelligence Vol. 100, no. (2021), p.
- Full Text:
- Reviewed:
- Description: The concept of impact-minimisation has previously been proposed as an approach to addressing the safety concerns that can arise from utility-maximising agents. An impact-minimising agent takes into account the potential impact of its actions on the state of the environment when selecting actions, so as to avoid unacceptable side-effects. This paper proposes and empirically evaluates an implementation of impact-minimisation within the framework of multiobjective reinforcement learning. The key contributions are a novel potential-based approach to specifying a measure of impact, and an examination of a variety of non-linear action-selection operators so as to achieve an acceptable trade-off between achieving the agent's primary task and minimising environmental impact. These experiments also highlight a previously unreported issue with noisy estimates for multiobjective agents using non-linear action-selection, which has broader implications for the application of multiobjective reinforcement learning. © 2021
- Authors: Vamplew, Peter , Foale, Cameron , Dazeley, Richard , Bignold, Adam
- Date: 2021
- Type: Text , Journal article
- Relation: Engineering Applications of Artificial Intelligence Vol. 100, no. (2021), p.
- Full Text:
- Reviewed:
- Description: The concept of impact-minimisation has previously been proposed as an approach to addressing the safety concerns that can arise from utility-maximising agents. An impact-minimising agent takes into account the potential impact of its actions on the state of the environment when selecting actions, so as to avoid unacceptable side-effects. This paper proposes and empirically evaluates an implementation of impact-minimisation within the framework of multiobjective reinforcement learning. The key contributions are a novel potential-based approach to specifying a measure of impact, and an examination of a variety of non-linear action-selection operators so as to achieve an acceptable trade-off between achieving the agent's primary task and minimising environmental impact. These experiments also highlight a previously unreported issue with noisy estimates for multiobjective agents using non-linear action-selection, which has broader implications for the application of multiobjective reinforcement learning. © 2021
Discrete-to-deep reinforcement learning methods
- Kurniawan, Budi, Vamplew, Peter, Papasimeon, Michael, Dazeley, Richard, Foale, Cameron
- Authors: Kurniawan, Budi , Vamplew, Peter , Papasimeon, Michael , Dazeley, Richard , Foale, Cameron
- Date: 2022
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 34, no. 3 (2022), p. 1713-1733
- Full Text:
- Reviewed:
- Description: Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. In complex problems, a neural RL approach is often able to learn a better solution than tabular RL, but generally takes longer. This paper proposes two methods, Discrete-to-Deep Supervised Policy Learning (D2D-SPL) and Discrete-to-Deep Supervised Q-value Learning (D2D-SQL), whose objective is to acquire the generalisability of a neural network at a cost nearer to that of a tabular method. Both methods combine RL and supervised learning (SL) and are based on the idea that a fast-learning tabular method can generate off-policy data to accelerate learning in neural RL. D2D-SPL uses the data to train a classifier which is then used as a controller for the RL problem. D2D-SQL uses the data to initialise a neural network which is then allowed to continue learning using another RL method. We demonstrate the viability of our algorithms with Cartpole, Lunar Lander and an aircraft manoeuvring problem, three continuous-space environments with low-dimensional state variables. Both methods learn at least 38% faster than baseline methods and yield policies that outperform them. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
- Authors: Kurniawan, Budi , Vamplew, Peter , Papasimeon, Michael , Dazeley, Richard , Foale, Cameron
- Date: 2022
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 34, no. 3 (2022), p. 1713-1733
- Full Text:
- Reviewed:
- Description: Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. In complex problems, a neural RL approach is often able to learn a better solution than tabular RL, but generally takes longer. This paper proposes two methods, Discrete-to-Deep Supervised Policy Learning (D2D-SPL) and Discrete-to-Deep Supervised Q-value Learning (D2D-SQL), whose objective is to acquire the generalisability of a neural network at a cost nearer to that of a tabular method. Both methods combine RL and supervised learning (SL) and are based on the idea that a fast-learning tabular method can generate off-policy data to accelerate learning in neural RL. D2D-SPL uses the data to train a classifier which is then used as a controller for the RL problem. D2D-SQL uses the data to initialise a neural network which is then allowed to continue learning using another RL method. We demonstrate the viability of our algorithms with Cartpole, Lunar Lander and an aircraft manoeuvring problem, three continuous-space environments with low-dimensional state variables. Both methods learn at least 38% faster than baseline methods and yield policies that outperform them. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
Scalar reward is not enough : a response to Silver, Singh, Precup and Sutton (2021)
- Vamplew, Peter, Smith, Benjamin, Källström, Johan, Ramos, Gabriel, Rădulescu, Roxana, Roijers, Diederik, Hayes, Conor, Heintz, Fredrik, Mannion, Patrick, Libin, Pieter, Dazeley, Richard, Foale, Cameron
- Authors: Vamplew, Peter , Smith, Benjamin , Källström, Johan , Ramos, Gabriel , Rădulescu, Roxana , Roijers, Diederik , Hayes, Conor , Heintz, Fredrik , Mannion, Patrick , Libin, Pieter , Dazeley, Richard , Foale, Cameron
- Date: 2022
- Type: Text , Journal article
- Relation: Autonomous Agents and Multi-Agent Systems Vol. 36, no. 2 (2022), p.
- Full Text:
- Reviewed:
- Description: The recent paper “Reward is Enough” by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, this type of reward is insufficient for the development of human-aligned artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour. © 2022, The Author(s).
- Authors: Vamplew, Peter , Smith, Benjamin , Källström, Johan , Ramos, Gabriel , Rădulescu, Roxana , Roijers, Diederik , Hayes, Conor , Heintz, Fredrik , Mannion, Patrick , Libin, Pieter , Dazeley, Richard , Foale, Cameron
- Date: 2022
- Type: Text , Journal article
- Relation: Autonomous Agents and Multi-Agent Systems Vol. 36, no. 2 (2022), p.
- Full Text:
- Reviewed:
- Description: The recent paper “Reward is Enough” by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, this type of reward is insufficient for the development of human-aligned artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour. © 2022, The Author(s).
Statistical calibration of long-term reanalysis data for australian fire weather conditions
- Biswas, Soubhik, Chand, Savin, Dowdy, Andrew, Wright, Wendy, Foale, Cameron, Zhao, Xiaohui, Deo, A
- Authors: Biswas, Soubhik , Chand, Savin , Dowdy, Andrew , Wright, Wendy , Foale, Cameron , Zhao, Xiaohui , Deo, A
- Date: 2022
- Type: Text , Journal article
- Relation: Journal of Applied Meteorology and Climatology Vol. 61, no. 6 (2022), p. 729-758
- Full Text: false
- Reviewed:
- Description: Reconstructed weather datasets, such as reanalyses based on model output with data assimilation, often show systematic biases in magnitude when compared with observations. Postprocessing approaches can help adjust the distribution so that the reconstructed data resemble the observed data as closely as possible. In this study, we have compared various statistical bias-correction approaches based on quantile–quantile matching to correct the data from the Twentieth Century Reanalysis, version 2c (20CRv2c), with observation-based data. Methods included in the comparison utilize a suite of different approaches: a linear model, a median-based approach, a nonparametric linear method, a spline-based method, and approaches that are based on the lognormal and Weibull distributions. These methods were applied to daily data in the Australian region for rainfall, maximum temperature, relative humidity, and wind speed. Note that these are the variables required to compute the forest fire danger index (FFDI), widely used in Australia to examine dangerous fire weather conditions. We have compared the relative errors and performances of each method across various locations in Australia and applied the approach with the lowest mean-absolute error across multiple variables to produce a reliable long-term biascorrected FFDI dataset across Australia. The spline-based data correction was found to have some benefits relative to the other methods in better representing the mean FFDI values and the extremes from the observed records for many of the cases examined here. It is intended that this statistical bias-correction approach applied to long-term reanalysis data will help enable new insight on climatological variations in hazardous phenomena, including dangerous wildfires in Australia extending over the past century. © 2022 American Meteorological Society.
The impact of environmental stochasticity on value-based multiobjective reinforcement learning
- Vamplew, Peter, Foale, Cameron, Dazeley, Richard
- Authors: Vamplew, Peter , Foale, Cameron , Dazeley, Richard
- Date: 2022
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 34, no. 3 (2022), p. 1783-1799
- Full Text:
- Reviewed:
- Description: A common approach to address multiobjective problems using reinforcement learning methods is to extend model-free, value-based algorithms such as Q-learning to use a vector of Q-values in combination with an appropriate action selection mechanism that is often based on scalarisation. Most prior empirical evaluation of these approaches has focused on deterministic environments. This study examines the impact on stochasticity in rewards and state transitions on the behaviour of multi-objective Q-learning. It shows that the nature of the optimal solution depends on these environmental characteristics, and also on whether we desire to maximise the Expected Scalarised Return (ESR) or the Scalarised Expected Return (SER). We also identify a novel aim which may arise in some applications of maximising SER subject to satisfying constraints on the variation in return and show that this may require different solutions than ESR or conventional SER. The analysis of the interaction between environmental stochasticity and multi-objective Q-learning is supported by empirical evaluations on several simple multiobjective Markov Decision Processes with varying characteristics. This includes a demonstration of a novel approach to learning deterministic SER-optimal policies for environments with stochastic rewards. In addition, we report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. Having highlighted the limitations of value-based model-free MORL methods, we discuss several alternative methods that may be more suitable for maximising SER in MOMDPs with stochastic transitions. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
- Authors: Vamplew, Peter , Foale, Cameron , Dazeley, Richard
- Date: 2022
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 34, no. 3 (2022), p. 1783-1799
- Full Text:
- Reviewed:
- Description: A common approach to address multiobjective problems using reinforcement learning methods is to extend model-free, value-based algorithms such as Q-learning to use a vector of Q-values in combination with an appropriate action selection mechanism that is often based on scalarisation. Most prior empirical evaluation of these approaches has focused on deterministic environments. This study examines the impact on stochasticity in rewards and state transitions on the behaviour of multi-objective Q-learning. It shows that the nature of the optimal solution depends on these environmental characteristics, and also on whether we desire to maximise the Expected Scalarised Return (ESR) or the Scalarised Expected Return (SER). We also identify a novel aim which may arise in some applications of maximising SER subject to satisfying constraints on the variation in return and show that this may require different solutions than ESR or conventional SER. The analysis of the interaction between environmental stochasticity and multi-objective Q-learning is supported by empirical evaluations on several simple multiobjective Markov Decision Processes with varying characteristics. This includes a demonstration of a novel approach to learning deterministic SER-optimal policies for environments with stochastic rewards. In addition, we report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. Having highlighted the limitations of value-based model-free MORL methods, we discuss several alternative methods that may be more suitable for maximising SER in MOMDPs with stochastic transitions. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
A conceptual framework for externally-influenced agents: an assisted reinforcement learning review
- Bignold, Adam, Cruz, Francisco, Taylor, Matthew, Brys, Tim, Dazeley, Richard, Vamplew, Peter, Foale, Cameron
- Authors: Bignold, Adam , Cruz, Francisco , Taylor, Matthew , Brys, Tim , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Ambient Intelligence and Humanized Computing Vol. 14, no. 4 (2023), p. 3621-3644
- Full Text:
- Reviewed:
- Description: A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent’s performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
- Authors: Bignold, Adam , Cruz, Francisco , Taylor, Matthew , Brys, Tim , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Ambient Intelligence and Humanized Computing Vol. 14, no. 4 (2023), p. 3621-3644
- Full Text:
- Reviewed:
- Description: A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent’s performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
A nethack learning environment language wrapper for autonomous agents
- Goodger, Nikolaj, Vamplew, Peter, Foale, Cameron, Dazeley, Richard
- Authors: Goodger, Nikolaj , Vamplew, Peter , Foale, Cameron , Dazeley, Richard
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Open Research Software Vol. 11, no. (2023), p.
- Full Text:
- Reviewed:
- Description: This paper describes a language wrapper for the NetHack Learning Environment (NLE) [1]. The wrapper replaces the non-language observations and actions with comparable language versions. The NLE offers a grand challenge for AI research while MiniHack [2] extends this potential to more specific and configurable tasks. By providing a language interface, we can enable further research on language agents and directly connect language models to a versatile environment. © 2023 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.
- Authors: Goodger, Nikolaj , Vamplew, Peter , Foale, Cameron , Dazeley, Richard
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Open Research Software Vol. 11, no. (2023), p.
- Full Text:
- Reviewed:
- Description: This paper describes a language wrapper for the NetHack Learning Environment (NLE) [1]. The wrapper replaces the non-language observations and actions with comparable language versions. The NLE offers a grand challenge for AI research while MiniHack [2] extends this potential to more specific and configurable tasks. By providing a language interface, we can enable further research on language agents and directly connect language models to a versatile environment. © 2023 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.
Human engagement providing evaluative and informative advice for interactive reinforcement learning
- Bignold, Adam, Cruz, Francisco, Dazeley, Richard, Vamplew, Peter, Foale, Cameron
- Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2023
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 35, no. 25 (2023), p. 18215-18230
- Full Text:
- Reviewed:
- Description: Interactive reinforcement learning proposes the use of externally sourced information in order to speed up the learning process. When interacting with a learner agent, humans may provide either evaluative or informative advice. Prior research has focused on the effect of human-sourced advice by including real-time feedback on the interactive reinforcement learning process, specifically aiming to improve the learning speed of the agent, while minimising the time demands on the human. This work focuses on answering which of two approaches, evaluative or informative, is the preferred instructional approach for humans. Moreover, this work presents an experimental setup for a human trial designed to compare the methods people use to deliver advice in terms of human engagement. The results obtained show that users giving informative advice to the learner agents provide more accurate advice, are willing to assist the learner agent for a longer time, and provide more advice per episode. Additionally, self-evaluation from participants using the informative approach has indicated that the agent’s ability to follow the advice is higher, and therefore, they feel their own advice to be of higher accuracy when compared to people providing evaluative advice. © 2022, The Author(s).
- Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2023
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 35, no. 25 (2023), p. 18215-18230
- Full Text:
- Reviewed:
- Description: Interactive reinforcement learning proposes the use of externally sourced information in order to speed up the learning process. When interacting with a learner agent, humans may provide either evaluative or informative advice. Prior research has focused on the effect of human-sourced advice by including real-time feedback on the interactive reinforcement learning process, specifically aiming to improve the learning speed of the agent, while minimising the time demands on the human. This work focuses on answering which of two approaches, evaluative or informative, is the preferred instructional approach for humans. Moreover, this work presents an experimental setup for a human trial designed to compare the methods people use to deliver advice in terms of human engagement. The results obtained show that users giving informative advice to the learner agents provide more accurate advice, are willing to assist the learner agent for a longer time, and provide more advice per episode. Additionally, self-evaluation from participants using the informative approach has indicated that the agent’s ability to follow the advice is higher, and therefore, they feel their own advice to be of higher accuracy when compared to people providing evaluative advice. © 2022, The Author(s).