A brief guide to multi-objective reinforcement learning and planning JAAMAS track
- Hayes, Conor, Bargiacchi, Eugenio, Källström, Johan, Macfarlane, Matthew, Reymond, Mathieu, Verstraeten, Timothy, Zintgraf, Luisa, Dazeley, Richard, Heintz, Fredrik, Howley, Enda, Irissappane, Athirai, Mannion, Patrick, Nowé, Ann, Ramos, Gabriel, Restelli, Marcello, Vamplew, Peter, Roijers, Diederik
- Authors: Hayes, Conor , Bargiacchi, Eugenio , Källström, Johan , Macfarlane, Matthew , Reymond, Mathieu , Verstraeten, Timothy , Zintgraf, Luisa , Dazeley, Richard , Heintz, Fredrik , Howley, Enda , Irissappane, Athirai , Mannion, Patrick , Nowé, Ann , Ramos, Gabriel , Restelli, Marcello , Vamplew, Peter , Roijers, Diederik
- Date: 2023
- Type: Text , Conference paper
- Relation: 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023, London, 29 May to 2 June 2023, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS Vol. 2023-May, p. 1988-1990
- Full Text:
- Reviewed:
- Description: Real-world sequential decision-making tasks are usually complex, and require trade-offs between multiple - often conflicting - objectives. However, the majority of research in reinforcement learning (RL) and decision-theoretic planning assumes a single objective, or that multiple objectives can be handled via a predefined weighted sum over the objectives. Such approaches may oversimplify the underlying problem, and produce suboptimal results. This extended abstract outlines the limitations of using a semi-blind iterative process to solve multi-objective decision making problems. Our extended paper [4], serves as a guide for the application of explicitly multi-objective methods to difficult problems. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
- Authors: Hayes, Conor , Bargiacchi, Eugenio , Källström, Johan , Macfarlane, Matthew , Reymond, Mathieu , Verstraeten, Timothy , Zintgraf, Luisa , Dazeley, Richard , Heintz, Fredrik , Howley, Enda , Irissappane, Athirai , Mannion, Patrick , Nowé, Ann , Ramos, Gabriel , Restelli, Marcello , Vamplew, Peter , Roijers, Diederik
- Date: 2023
- Type: Text , Conference paper
- Relation: 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023, London, 29 May to 2 June 2023, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS Vol. 2023-May, p. 1988-1990
- Full Text:
- Reviewed:
- Description: Real-world sequential decision-making tasks are usually complex, and require trade-offs between multiple - often conflicting - objectives. However, the majority of research in reinforcement learning (RL) and decision-theoretic planning assumes a single objective, or that multiple objectives can be handled via a predefined weighted sum over the objectives. Such approaches may oversimplify the underlying problem, and produce suboptimal results. This extended abstract outlines the limitations of using a semi-blind iterative process to solve multi-objective decision making problems. Our extended paper [4], serves as a guide for the application of explicitly multi-objective methods to difficult problems. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
A conceptual framework for externally-influenced agents: an assisted reinforcement learning review
- Bignold, Adam, Cruz, Francisco, Taylor, Matthew, Brys, Tim, Dazeley, Richard, Vamplew, Peter, Foale, Cameron
- Authors: Bignold, Adam , Cruz, Francisco , Taylor, Matthew , Brys, Tim , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Ambient Intelligence and Humanized Computing Vol. 14, no. 4 (2023), p. 3621-3644
- Full Text:
- Reviewed:
- Description: A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent’s performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
- Authors: Bignold, Adam , Cruz, Francisco , Taylor, Matthew , Brys, Tim , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Ambient Intelligence and Humanized Computing Vol. 14, no. 4 (2023), p. 3621-3644
- Full Text:
- Reviewed:
- Description: A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent’s performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
A multi-objective deep reinforcement learning framework
- Nguyen, Thanh, Nguyen, Ngoc, Vamplew, Peter, Nahavandi, Saeid, Dazeley, Richard, Lim, Chee
- Authors: Nguyen, Thanh , Nguyen, Ngoc , Vamplew, Peter , Nahavandi, Saeid , Dazeley, Richard , Lim, Chee
- Date: 2020
- Type: Text , Journal article
- Relation: Engineering Applications of Artificial Intelligence Vol. 96, no. (2020), p.
- Full Text:
- Reviewed:
- Description: This paper introduces a new scalable multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We develop a high-performance MODRL framework that supports both single-policy and multi-policy strategies, as well as both linear and non-linear approaches to action selection. The experimental results on two benchmark problems (two-objective deep sea treasure environment and three-objective Mountain Car problem) indicate that the proposed framework is able to find the Pareto-optimal solutions effectively. The proposed framework is generic and highly modularized, which allows the integration of different deep reinforcement learning algorithms in different complex problem domains. This therefore overcomes many disadvantages involved with standard multi-objective reinforcement learning methods in the current literature. The proposed framework acts as a testbed platform that accelerates the development of MODRL for solving increasingly complicated multi-objective problems. © 2020 Elsevier Ltd
- Authors: Nguyen, Thanh , Nguyen, Ngoc , Vamplew, Peter , Nahavandi, Saeid , Dazeley, Richard , Lim, Chee
- Date: 2020
- Type: Text , Journal article
- Relation: Engineering Applications of Artificial Intelligence Vol. 96, no. (2020), p.
- Full Text:
- Reviewed:
- Description: This paper introduces a new scalable multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We develop a high-performance MODRL framework that supports both single-policy and multi-policy strategies, as well as both linear and non-linear approaches to action selection. The experimental results on two benchmark problems (two-objective deep sea treasure environment and three-objective Mountain Car problem) indicate that the proposed framework is able to find the Pareto-optimal solutions effectively. The proposed framework is generic and highly modularized, which allows the integration of different deep reinforcement learning algorithms in different complex problem domains. This therefore overcomes many disadvantages involved with standard multi-objective reinforcement learning methods in the current literature. The proposed framework acts as a testbed platform that accelerates the development of MODRL for solving increasingly complicated multi-objective problems. © 2020 Elsevier Ltd
A nethack learning environment language wrapper for autonomous agents
- Goodger, Nikolaj, Vamplew, Peter, Foale, Cameron, Dazeley, Richard
- Authors: Goodger, Nikolaj , Vamplew, Peter , Foale, Cameron , Dazeley, Richard
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Open Research Software Vol. 11, no. (2023), p.
- Full Text:
- Reviewed:
- Description: This paper describes a language wrapper for the NetHack Learning Environment (NLE) [1]. The wrapper replaces the non-language observations and actions with comparable language versions. The NLE offers a grand challenge for AI research while MiniHack [2] extends this potential to more specific and configurable tasks. By providing a language interface, we can enable further research on language agents and directly connect language models to a versatile environment. © 2023 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.
- Authors: Goodger, Nikolaj , Vamplew, Peter , Foale, Cameron , Dazeley, Richard
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Open Research Software Vol. 11, no. (2023), p.
- Full Text:
- Reviewed:
- Description: This paper describes a language wrapper for the NetHack Learning Environment (NLE) [1]. The wrapper replaces the non-language observations and actions with comparable language versions. The NLE offers a grand challenge for AI research while MiniHack [2] extends this potential to more specific and configurable tasks. By providing a language interface, we can enable further research on language agents and directly connect language models to a versatile environment. © 2023 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.
A practical guide to multi-objective reinforcement learning and planning
- Hayes, Conor, Rădulescu, Roxana, Bargiacchi, Eugenio, Källström, Johan, Macfarlane, Matthew, Reymond, Mathieu, Verstraeten, Timothy, Zintgraf, Luisa, Dazeley, Richard, Heintz, Fredrik, Howley, Enda, Irissappane, Athirai, Mannion, Patrick, Nowé, Ann, Ramos, Gabriel, Restelli, Marcello, Vamplew, Peter, Roijers, Diederik
- Authors: Hayes, Conor , Rădulescu, Roxana , Bargiacchi, Eugenio , Källström, Johan , Macfarlane, Matthew , Reymond, Mathieu , Verstraeten, Timothy , Zintgraf, Luisa , Dazeley, Richard , Heintz, Fredrik , Howley, Enda , Irissappane, Athirai , Mannion, Patrick , Nowé, Ann , Ramos, Gabriel , Restelli, Marcello , Vamplew, Peter , Roijers, Diederik
- Date: 2022
- Type: Text , Journal article
- Relation: Autonomous Agents and Multi-Agent Systems Vol. 36, no. 1 (2022), p.
- Full Text:
- Reviewed:
- Description: Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s).
- Authors: Hayes, Conor , Rădulescu, Roxana , Bargiacchi, Eugenio , Källström, Johan , Macfarlane, Matthew , Reymond, Mathieu , Verstraeten, Timothy , Zintgraf, Luisa , Dazeley, Richard , Heintz, Fredrik , Howley, Enda , Irissappane, Athirai , Mannion, Patrick , Nowé, Ann , Ramos, Gabriel , Restelli, Marcello , Vamplew, Peter , Roijers, Diederik
- Date: 2022
- Type: Text , Journal article
- Relation: Autonomous Agents and Multi-Agent Systems Vol. 36, no. 1 (2022), p.
- Full Text:
- Reviewed:
- Description: Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s).
A prioritized objective actor-critic method for deep reinforcement learning
- Nguyen, Ngoc, Nguyen, Thanh, Vamplew, Peter, Dazeley, Richard, Nahavandi, Saeid
- Authors: Nguyen, Ngoc , Nguyen, Thanh , Vamplew, Peter , Dazeley, Richard , Nahavandi, Saeid
- Date: 2021
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 33, no. 16 (2021), p. 10335-10349
- Full Text: false
- Reviewed:
- Description: An increasing number of complex problems have naturally posed significant challenges in decision-making theory and reinforcement learning practices. These problems often involve multiple conflicting reward signals that inherently cause agents’ poor exploration in seeking a specific goal. In extreme cases, the agent gets stuck in a sub-optimal solution and starts behaving harmfully. To overcome such obstacles, we introduce two actor-critic deep reinforcement learning methods, namely Multi-Critic Single Policy (MCSP) and Single Critic Multi-Policy (SCMP), which can adjust agent behaviors to efficiently achieve a designated goal by adopting a weighted-sum scalarization of different objective functions. In particular, MCSP creates a human-centric policy that corresponds to a predefined priority weight of different objectives. Whereas, SCMP is capable of generating a mixed policy based on a set of priority weights, i.e., the generated policy uses the knowledge of different policies (each policy corresponds to a priority weight) to dynamically prioritize objectives in real time. We examine our methods by using the Asynchronous Advantage Actor-Critic (A3C) algorithm to utilize the multithreading mechanism for dynamically balancing training intensity of different policies into a single network. Finally, simulation results show that MCSP and SCMP significantly outperform A3C with respect to the mean of total rewards in two complex problems: Food Collector and Seaquest. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd. part of Springer Nature.
A survey of multi-objective sequential decision-making
- Roijers, Diederik, Vamplew, Peter, Whiteson, Shimon, Dazeley, Richard
- Authors: Roijers, Diederik , Vamplew, Peter , Whiteson, Shimon , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Journal of Artificial Intelligence Research Vol. 48, no. (2013), p. 67-113
- Full Text:
- Reviewed:
- Description: Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work. © 2013 AI Access Foundation.
- Description: C1
- Authors: Roijers, Diederik , Vamplew, Peter , Whiteson, Shimon , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Journal of Artificial Intelligence Research Vol. 48, no. (2013), p. 67-113
- Full Text:
- Reviewed:
- Description: Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work. © 2013 AI Access Foundation.
- Description: C1
AI apology : interactive multi-objective reinforcement learning for human-aligned AI
- Harland, Hadassah, Dazeley, Richard, Nakisa, Bahareh, Cruz, Francisco, Vamplew, Peter
- Authors: Harland, Hadassah , Dazeley, Richard , Nakisa, Bahareh , Cruz, Francisco , Vamplew, Peter
- Date: 2023
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 35, no. 23 (2023), p. 16917-16930
- Full Text:
- Reviewed:
- Description: For an Artificially Intelligent (AI) system to maintain alignment between human desires and its behaviour, it is important that the AI account for human preferences. This paper proposes and empirically evaluates the first approach to aligning agent behaviour to human preference via an apologetic framework. In practice, an apology may consist of an acknowledgement, an explanation and an intention for the improvement of future behaviour. We propose that such an apology, provided in response to recognition of undesirable behaviour, is one way in which an AI agent may both be transparent and trustworthy to a human user. Furthermore, that behavioural adaptation as part of apology is a viable approach to correct against undesirable behaviours. The Act-Assess-Apologise framework potentially could address both the practical and social needs of a human user, to recognise and make reparations against prior undesirable behaviour and adjust for the future. Applied to a dual-auxiliary impact minimisation problem, the apologetic agent had a near perfect determination and apology provision accuracy in several non-trivial configurations. The agent subsequently demonstrated behaviour alignment with success that included up to complete avoidance of the impacts described by these objectives in some scenarios. © 2023, The Author(s).
- Authors: Harland, Hadassah , Dazeley, Richard , Nakisa, Bahareh , Cruz, Francisco , Vamplew, Peter
- Date: 2023
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 35, no. 23 (2023), p. 16917-16930
- Full Text:
- Reviewed:
- Description: For an Artificially Intelligent (AI) system to maintain alignment between human desires and its behaviour, it is important that the AI account for human preferences. This paper proposes and empirically evaluates the first approach to aligning agent behaviour to human preference via an apologetic framework. In practice, an apology may consist of an acknowledgement, an explanation and an intention for the improvement of future behaviour. We propose that such an apology, provided in response to recognition of undesirable behaviour, is one way in which an AI agent may both be transparent and trustworthy to a human user. Furthermore, that behavioural adaptation as part of apology is a viable approach to correct against undesirable behaviours. The Act-Assess-Apologise framework potentially could address both the practical and social needs of a human user, to recognise and make reparations against prior undesirable behaviour and adjust for the future. Applied to a dual-auxiliary impact minimisation problem, the apologetic agent had a near perfect determination and apology provision accuracy in several non-trivial configurations. The agent subsequently demonstrated behaviour alignment with success that included up to complete avoidance of the impacts described by these objectives in some scenarios. © 2023, The Author(s).
An approach for generalising symbolic knowledge
- Dazeley, Richard, Kang, Byeongho
- Authors: Dazeley, Richard , Kang, Byeongho
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at 21st Australasian Joint Conference on Artificial Intelligence, Auckland, New Zealand : 1st-5th December 2008 p. 379-385
- Full Text: false
- Description: Many researchers and developers of knowledge based systems (KBS) have been incorporating the notion of context. However, they generally treat context as a static entity, neglecting many connectionists’ work in learning hidden and dynamic contexts, which aids generalization. This paper presents a method that models hidden context within a symbolic domain achieving a level of generalisation. Results indicate that the method can learn the information that experts have difficulty providing by generalising the captured knowledge.
- Description: 2003006525
An evaluation methodology for interactive reinforcement learning with simulated users
- Bignold, Adam, Cruz, Francisco, Dazeley, Richard, Vamplew, Peter, Foale, Cameron
- Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2021
- Type: Text , Journal article
- Relation: Biomimetics Vol. 6, no. 1 (2021), p. 1-15
- Full Text:
- Reviewed:
- Description: Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
- Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
- Date: 2021
- Type: Text , Journal article
- Relation: Biomimetics Vol. 6, no. 1 (2021), p. 1-15
- Full Text:
- Reviewed:
- Description: Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
An expert system methodology for SMEs and NPOs
- Authors: Dazeley, Richard
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at 11th Australian Conference on Knowledge Management and Intelligent Decision Support, ACKMIDS 2008, Ballarat, Victoria : 8th-10th December 2008
- Full Text:
- Description: Traditionally Expert Systems (ES) require a full analysis of the business problem by a Knowledge Engineer (KE) to develop a solution. This inherently makes ES technology very expensive and beyond the affordability of the majority of Small and Medium sized Enterprises (SMEs) and Non-Profit Organisations (NPOs). Therefore, SMEs and NPOs tend to only have access to off-the-shelf solutions to generic problems, which rarely meet the full extent of an organisation’s requirements. One existing methodological stream of research, Ripple-Down Rules (RDR) goes some of the way to being suitable to SMEs and NPOs as it removes the need for a knowledge engineer. This group of methodologies provide an environment where a company can develop large knowledge based systems themselves, specifically tailored to the company’s individual situation. These methods, however, require constant supervision by the expert during development, which is still a significant burden on the organisation. This paper discusses an extension to an RDR method, known as Rated MCRDR (RM) and a feature called prudence analysis. This enhanced methodology to ES development is particularly well suited to the development of ES in restricted environments such as SMEs and NPOs.
- Description: 2003006507
- Authors: Dazeley, Richard
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at 11th Australian Conference on Knowledge Management and Intelligent Decision Support, ACKMIDS 2008, Ballarat, Victoria : 8th-10th December 2008
- Full Text:
- Description: Traditionally Expert Systems (ES) require a full analysis of the business problem by a Knowledge Engineer (KE) to develop a solution. This inherently makes ES technology very expensive and beyond the affordability of the majority of Small and Medium sized Enterprises (SMEs) and Non-Profit Organisations (NPOs). Therefore, SMEs and NPOs tend to only have access to off-the-shelf solutions to generic problems, which rarely meet the full extent of an organisation’s requirements. One existing methodological stream of research, Ripple-Down Rules (RDR) goes some of the way to being suitable to SMEs and NPOs as it removes the need for a knowledge engineer. This group of methodologies provide an environment where a company can develop large knowledge based systems themselves, specifically tailored to the company’s individual situation. These methods, however, require constant supervision by the expert during development, which is still a significant burden on the organisation. This paper discusses an extension to an RDR method, known as Rated MCRDR (RM) and a feature called prudence analysis. This enhanced methodology to ES development is particularly well suited to the development of ES in restricted environments such as SMEs and NPOs.
- Description: 2003006507
Authorship analysis of aliases: Does topic influence accuracy?
- Layton, Robert, Watters, Paul, Dazeley, Richard
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Natural Language Engineering Vol. Online first, no. (2013), p.
- Full Text:
- Reviewed:
- Description: Aliases play an important role in online environments by facilitating anonymity, but also can be used to hide the identity of cybercriminals. Previous studies have investigated this alias matching problem in an attempt to identify whether two aliases are shared by an author, which can assist with identifying users. Those studies create their training data by randomly splitting the documents associated with an alias into two sub-aliases. Models have been built that can regularly achieve over 90% accuracy for recovering the linkage between these ‘random sub-aliases’. In this paper, random sub-alias generation is shown to enable these high accuracies, and thus does not adequately model the real-world problem. In contrast, creating sub-aliases using topic-based splitting drastically reduces the accuracy of all authorship methods tested. We then present a methodology that can be performed on non-topic controlled datasets, to produce topic-based sub-aliases that are more difficult to match. Finally, we present an experimental comparison between many authorship methods to see which methods better match aliases under these conditions, finding that local n-gram methods perform better than others.
Authorship attribution for Twitter in 140 characters or less
- Layton, Robert, Watters, Paul, Dazeley, Richard
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at - 2nd Cybercrime and Trustworthy Computing Workshop, CTC 2010 p. 1-8
- Full Text:
- Reviewed:
- Description: Authorship attribution is a growing field, moving from beginnings in linguistics to recent advances in text mining. Through this change came an increase in the capability of authorship attribution methods both in their accuracy and the ability to consider more difficult problems. Research into authorship attribution in the 19th century considered it difficult to determine the authorship of a document of fewer than 1000 words. By the 1990s this values had decreased to less than 500 words and in the early 21 st century it was considered possible to determine the authorship of a document in 250 words. The need for this ever decreasing limit is exemplified by the trend towards many shorter communications rather than fewer longer communications, such as the move from traditional multi-page handwritten letters to shorter, more focused emails. This trend has also been shown in online crime, where many attacks such as phishing or bullying are performed using very concise language. Cybercrime messages have long been hosted on Internet Relay Chats (IRCs) which have allowed members to hide behind screen names and connect anonymously. More recently, Twitter and other short message based web services have been used as a hosting ground for online crimes. This paper presents some evaluations of current techniques and identifies some new preprocessing methods that can be used to enable authorship to be determined at rates significantly better than chance for documents of 140 characters or less, a format popularised by the micro-blogging website Twitter1. We show that the SCAP methodology performs extremely well on twitter messages and even with restrictions on the types of information allowed, such as the recipient of directed messages, still perform significantly higher than chance. Further to this, we show that 120 tweets per user is an important threshold, at which point adding more tweets per user gives a small but non-significant increase in accuracy. © 2010 IEEE.
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at - 2nd Cybercrime and Trustworthy Computing Workshop, CTC 2010 p. 1-8
- Full Text:
- Reviewed:
- Description: Authorship attribution is a growing field, moving from beginnings in linguistics to recent advances in text mining. Through this change came an increase in the capability of authorship attribution methods both in their accuracy and the ability to consider more difficult problems. Research into authorship attribution in the 19th century considered it difficult to determine the authorship of a document of fewer than 1000 words. By the 1990s this values had decreased to less than 500 words and in the early 21 st century it was considered possible to determine the authorship of a document in 250 words. The need for this ever decreasing limit is exemplified by the trend towards many shorter communications rather than fewer longer communications, such as the move from traditional multi-page handwritten letters to shorter, more focused emails. This trend has also been shown in online crime, where many attacks such as phishing or bullying are performed using very concise language. Cybercrime messages have long been hosted on Internet Relay Chats (IRCs) which have allowed members to hide behind screen names and connect anonymously. More recently, Twitter and other short message based web services have been used as a hosting ground for online crimes. This paper presents some evaluations of current techniques and identifies some new preprocessing methods that can be used to enable authorship to be determined at rates significantly better than chance for documents of 140 characters or less, a format popularised by the micro-blogging website Twitter1. We show that the SCAP methodology performs extremely well on twitter messages and even with restrictions on the types of information allowed, such as the recipient of directed messages, still perform significantly higher than chance. Further to this, we show that 120 tweets per user is an important threshold, at which point adding more tweets per user gives a small but non-significant increase in accuracy. © 2010 IEEE.
Automated unsupervised authorship analysis using evidence accumulation clustering
- Layton, Robert, Watters, Paul, Dazeley, Richard
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Natural Language Engineering Vol. 19, no. 1 (2013), p. 95-120
- Full Text:
- Reviewed:
- Description: Authorship Analysis aims to extract information about the authorship of documents from features within those documents. Typically, this is performed as a classification task with the aim of identifying the author of a document, given a set of documents of known authorship. Alternatively, unsupervised methods have been developed primarily as visualisation tools to assist the manual discovery of clusters of authorship within a corpus by analysts. However, there is a need in many fields for more sophisticated unsupervised methods to automate the discovery, profiling and organisation of related information through clustering of documents by authorship. An automated and unsupervised methodology for clustering documents by authorship is proposed in this paper. The methodology is named NUANCE, for n-gram Unsupervised Automated Natural Cluster Ensemble. Testing indicates that the derived clusters have a strong correlation to the true authorship of unseen documents. © 2011 Cambridge University Press.
- Description: 2003010584
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Natural Language Engineering Vol. 19, no. 1 (2013), p. 95-120
- Full Text:
- Reviewed:
- Description: Authorship Analysis aims to extract information about the authorship of documents from features within those documents. Typically, this is performed as a classification task with the aim of identifying the author of a document, given a set of documents of known authorship. Alternatively, unsupervised methods have been developed primarily as visualisation tools to assist the manual discovery of clusters of authorship within a corpus by analysts. However, there is a need in many fields for more sophisticated unsupervised methods to automate the discovery, profiling and organisation of related information through clustering of documents by authorship. An automated and unsupervised methodology for clustering documents by authorship is proposed in this paper. The methodology is named NUANCE, for n-gram Unsupervised Automated Natural Cluster Ensemble. Testing indicates that the derived clusters have a strong correlation to the true authorship of unseen documents. © 2011 Cambridge University Press.
- Description: 2003010584
Automatically determining phishing campaigns using the USCAP methodology
- Layton, Robert, Watters, Paul, Dazeley, Richard
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at General Members Meeting and eCrime Researchers Summit, eCrime 2010 p. 1-8
- Full Text:
- Reviewed:
- Description: Phishing fraudsters attempt to create an environment which looks and feels like a legitimate institution, while at the same time attempting to bypass filters and suspicions of their targets. This is a difficult compromise for the phishers and presents a weakness in the process of conducting this fraud. In this research, a methodology is presented that looks at the differences that occur between phishing websites from an authorship analysis perspective and is able to determine different phishing campaigns undertaken by phishing groups. The methodology is named USCAP, for Unsupervised SCAP, which builds on the SCAP methodology from supervised authorship and extends it for unsupervised learning problems. The phishing website source code is examined to generate a model that gives the size and scope of each of the recognized phishing campaigns. The USCAP methodology introduces the first time that phishing websites have been clustered by campaign in an automatic and reliable way, compared to previous methods which relied on costly expert analysis of phishing websites. Evaluation of these clusters indicates that each cluster is strongly consistent with a high stability and reliability when analyzed using new information about the attacks, such as the dates that the attack occurred on. The clusters found are indicative of different phishing campaigns, presenting a step towards an automated phishing authorship analysis methodology. © 2010 IEEE.
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at General Members Meeting and eCrime Researchers Summit, eCrime 2010 p. 1-8
- Full Text:
- Reviewed:
- Description: Phishing fraudsters attempt to create an environment which looks and feels like a legitimate institution, while at the same time attempting to bypass filters and suspicions of their targets. This is a difficult compromise for the phishers and presents a weakness in the process of conducting this fraud. In this research, a methodology is presented that looks at the differences that occur between phishing websites from an authorship analysis perspective and is able to determine different phishing campaigns undertaken by phishing groups. The methodology is named USCAP, for Unsupervised SCAP, which builds on the SCAP methodology from supervised authorship and extends it for unsupervised learning problems. The phishing website source code is examined to generate a model that gives the size and scope of each of the recognized phishing campaigns. The USCAP methodology introduces the first time that phishing websites have been clustered by campaign in an automatic and reliable way, compared to previous methods which relied on costly expert analysis of phishing websites. Evaluation of these clusters indicates that each cluster is strongly consistent with a high stability and reliability when analyzed using new information about the attacks, such as the dates that the attack occurred on. The clusters found are indicative of different phishing campaigns, presenting a step towards an automated phishing authorship analysis methodology. © 2010 IEEE.
Coarse Q-Learning : Addressing the convergence problem when quantizing continuous state variables
- Dazeley, Richard, Vamplew, Peter, Bignold, Adam
- Authors: Dazeley, Richard , Vamplew, Peter , Bignold, Adam
- Date: 2015
- Type: Text , Conference paper
- Relation: 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making
- Full Text: false
- Reviewed:
- Description: Value-based approaches to reinforcement learning (RL) maintain a value function that measures the long term utility of a state or state-action pair. A long standing issue in RL is how to create a finite representation in a continuous, and therefore infinite, state environment. The common approach is to use function approximators such as tile coding, memory or instance based methods. These provide some balance between generalisation, resolution, and storage, but converge slowly in multidimensional state environments. Another approach of quantizing state into lookup tables has been commonly regarded as highly problematic, due to large memory requirements and poor generalisation. In particular , attempting to reduce memory requirements and increase generalisation by using coarser quantization forms a non-Markovian system that does not converge. This paper investigates the problem in using quantized lookup tables and presents an extension to the Q-Learning algorithm, referred to as Coarse Q-Learning (C QL), which resolves these issues. The presented algorithm will be shown to drastically reduce the memory requirements and increase generalisation by simulating the Markov property. In particular, this algorithm means the size of the input space is determined by the granularity required by the policy being learnt, rather than by the inadequacies of the learning algorithm or the nature of the state-reward dynamics of the environment. Importantly, the method presented solves the problem represented by the curse of dimensionality.
Consensus clustering and supervised classification for profiling phishing emails in internet commerce security
- Dazeley, Richard, Yearwood, John, Kang, Byeongho, Kelarev, Andrei
- Authors: Dazeley, Richard , Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 11th International Workshop on Knowledge Management and Acquisition for Smart Systems and Services, PKAW 2010 Vol. 6232 LNAI, p. 235-246
- Full Text:
- Reviewed:
- Description: This article investigates internet commerce security applications of a novel combined method, which uses unsupervised consensus clustering algorithms in combination with supervised classification methods. First, a variety of independent clustering algorithms are applied to a randomized sample of data. Second, several consensus functions and sophisticated algorithms are used to combine these independent clusterings into one final consensus clustering. Third, the consensus clustering of the randomized sample is used as a training set to train several fast supervised classification algorithms. Finally, these fast classification algorithms are used to classify the whole large data set. One of the advantages of this approach is in its ability to facilitate the inclusion of contributions from domain experts in order to adjust the training set created by consensus clustering. We apply this approach to profiling phishing emails selected from a very large data set supplied by the industry partners of the Centre for Informatics and Applied Optimization. Our experiments compare the performance of several classification algorithms incorporated in this scheme. © 2010 Springer-Verlag Berlin Heidelberg.
- Authors: Dazeley, Richard , Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 11th International Workshop on Knowledge Management and Acquisition for Smart Systems and Services, PKAW 2010 Vol. 6232 LNAI, p. 235-246
- Full Text:
- Reviewed:
- Description: This article investigates internet commerce security applications of a novel combined method, which uses unsupervised consensus clustering algorithms in combination with supervised classification methods. First, a variety of independent clustering algorithms are applied to a randomized sample of data. Second, several consensus functions and sophisticated algorithms are used to combine these independent clusterings into one final consensus clustering. Third, the consensus clustering of the randomized sample is used as a training set to train several fast supervised classification algorithms. Finally, these fast classification algorithms are used to classify the whole large data set. One of the advantages of this approach is in its ability to facilitate the inclusion of contributions from domain experts in order to adjust the training set created by consensus clustering. We apply this approach to profiling phishing emails selected from a very large data set supplied by the industry partners of the Centre for Informatics and Applied Optimization. Our experiments compare the performance of several classification algorithms incorporated in this scheme. © 2010 Springer-Verlag Berlin Heidelberg.
Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks
- Vamplew, Peter, Dazeley, Richard, Barker, Ewan, Kelarev, Andrei
- Authors: Vamplew, Peter , Dazeley, Richard , Barker, Ewan , Kelarev, Andrei
- Date: 2009
- Type: Text , Book chapter
- Relation: AI 2009 : Advances in Artificial Intelligence : 22nd Australasian Joint Conference, Melbourne, Australia, December 1-4, 2009. Proceedings Chapter p. 340-349
- Full Text:
- Description: Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problems with multiple conflicting objectives. This paper discusses the advantages gained from applying stochastic policies to multiobjective tasks and examines a particular form of stochastic policy known as a mixture policy. Two methods are proposed for deriving mixture policies for episodic multiobjective tasks from deterministic base policies found via scalarised reinforcement learning. It is shown that these approaches are an efficient means of identifying solutions which offer a superior match to the user’s preferences than can be achieved by methods based strictly on deterministic policies.
- Description: 2003007906
- Authors: Vamplew, Peter , Dazeley, Richard , Barker, Ewan , Kelarev, Andrei
- Date: 2009
- Type: Text , Book chapter
- Relation: AI 2009 : Advances in Artificial Intelligence : 22nd Australasian Joint Conference, Melbourne, Australia, December 1-4, 2009. Proceedings Chapter p. 340-349
- Full Text:
- Description: Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problems with multiple conflicting objectives. This paper discusses the advantages gained from applying stochastic policies to multiobjective tasks and examines a particular form of stochastic policy known as a mixture policy. Two methods are proposed for deriving mixture policies for episodic multiobjective tasks from deterministic base policies found via scalarised reinforcement learning. It is shown that these approaches are an efficient means of identifying solutions which offer a superior match to the user’s preferences than can be achieved by methods based strictly on deterministic policies.
- Description: 2003007906
Detecting the knowledge boundary with prudence analysis
- Dazeley, Richard, Kang, Byeongho
- Authors: Dazeley, Richard , Kang, Byeongho
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at 21st Australasian Joint Conference on Artificial Intelligence, Auckland, New Zealand : 1st-5th December 2008 p. 482-488
- Full Text: false
- Description: Prudence analysis (PA) is a relatively new, practical and highly innovative approach to solving the problem of brittleness in knowledge based systems (KBS). PA is essentially an online validation approach, where as each situation or case is presented to the KBS for inferencing the result is simultaneously validated. This paper introduces a new approach to PA that analyses the structure of knowledge rather than the comparing cases with archived situations. This new approach is positively compared against earlier systems for PA, strongly indicating the viability of the approach.
- Description: 2003006511
Detection of CAN by ensemble classifiers based on Ripple Down rules
- Kelarev, Andrei, Dazeley, Richard, Stranieri, Andrew, Yearwood, John, Jelinek, Herbert
- Authors: Kelarev, Andrei , Dazeley, Richard , Stranieri, Andrew , Yearwood, John , Jelinek, Herbert
- Date: 2012
- Type: Text , Book chapter
- Relation: Knowledge Management and Acquisition for Intelligent Systems p. 147-159
- Full Text: false
- Reviewed:
- Description: It is well known that classification models produced by the Ripple Down Rules are easier to maintain and update. They are compact and can provide an explanation of their reasoning making them easy to understand for medical practitioners. This article is devoted to an empirical investigation and comparison of several ensemble methods based on Ripple Down Rules in a novel application for the detection of cardiovascular autonomic neuropathy (CAN) from an extensive data set collected by the Diabetes Complications Screening Research Initiative at Charles Sturt University. Our experiments included essential ensemble methods, several more recent state-of-the-art techniques, and a novel consensus function based on graph partitioning. The results show that our novel application of Ripple Down Rules in ensemble classifiers for the detection of CAN achieved better performance parameters compared with the outcomes obtained previously in the literature.