List of Titles

Quick View

A conceptual framework for externally-influenced agents: an assisted reinforcement learning review

- Bignold, Adam, Cruz, Francisco, Taylor, Matthew, Brys, Tim, Dazeley, Richard, Vamplew, Peter, Foale, Cameron

Authors: Bignold, Adam , Cruz, Francisco , Taylor, Matthew , Brys, Tim , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
Date: 2023
Type: Text , Journal article
Relation: Journal of Ambient Intelligence and Humanized Computing Vol. 14, no. 4 (2023), p. 3621-3644
Full Text:
Reviewed:
Description: A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent’s performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.

Quick View

A multi-objective deep reinforcement learning framework

- Nguyen, Thanh, Nguyen, Ngoc, Vamplew, Peter, Nahavandi, Saeid, Dazeley, Richard, Lim, Chee

Authors: Nguyen, Thanh , Nguyen, Ngoc , Vamplew, Peter , Nahavandi, Saeid , Dazeley, Richard , Lim, Chee
Date: 2020
Type: Text , Journal article
Relation: Engineering Applications of Artificial Intelligence Vol. 96, no. (2020), p.
Full Text:
Reviewed:
Description: This paper introduces a new scalable multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We develop a high-performance MODRL framework that supports both single-policy and multi-policy strategies, as well as both linear and non-linear approaches to action selection. The experimental results on two benchmark problems (two-objective deep sea treasure environment and three-objective Mountain Car problem) indicate that the proposed framework is able to find the Pareto-optimal solutions effectively. The proposed framework is generic and highly modularized, which allows the integration of different deep reinforcement learning algorithms in different complex problem domains. This therefore overcomes many disadvantages involved with standard multi-objective reinforcement learning methods in the current literature. The proposed framework acts as a testbed platform that accelerates the development of MODRL for solving increasingly complicated multi-objective problems. © 2020 Elsevier Ltd

Quick View

A nethack learning environment language wrapper for autonomous agents

- Goodger, Nikolaj, Vamplew, Peter, Foale, Cameron, Dazeley, Richard

Authors: Goodger, Nikolaj , Vamplew, Peter , Foale, Cameron , Dazeley, Richard
Date: 2023
Type: Text , Journal article
Relation: Journal of Open Research Software Vol. 11, no. (2023), p.
Full Text:
Reviewed:
Description: This paper describes a language wrapper for the NetHack Learning Environment (NLE) [1]. The wrapper replaces the non-language observations and actions with comparable language versions. The NLE offers a grand challenge for AI research while MiniHack [2] extends this potential to more specific and configurable tasks. By providing a language interface, we can enable further research on language agents and directly connect language models to a versatile environment. © 2023 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.

Quick View

A novel ensemble of hybrid intrusion detection system for detecting internet of things attacks

- Khraisat, Ansam, Gondal, Iqbal, Vamplew, Peter, Kamruzzaman, Joarder, Alazab, Ammar

Authors: Khraisat, Ansam , Gondal, Iqbal , Vamplew, Peter , Kamruzzaman, Joarder , Alazab, Ammar
Date: 2019
Type: Text , Journal article
Relation: Electronics (Switzerland) Vol. 8, no. 11 (2019), p.
Full Text:
Reviewed:
Description: The Internet of Things (IoT) has been rapidly evolving towards making a greater impact on everyday life to large industrial systems. Unfortunately, this has attracted the attention of cybercriminals who made IoT a target of malicious activities, opening the door to a possible attack to the end nodes. Due to the large number and diverse types of IoT devices, it is a challenging task to protect the IoT infrastructure using a traditional intrusion detection system. To protect IoT devices, a novel ensemble Hybrid Intrusion Detection System (HIDS) is proposed by combining a C5 classifier and One Class Support Vector Machine classifier. HIDS combines the advantages of Signature Intrusion Detection System (SIDS) and Anomaly-based Intrusion Detection System (AIDS). The aim of this framework is to detect both the well-known intrusions and zero-day attacks with high detection accuracy and low false-alarm rates. The proposed HIDS is evaluated using the Bot-IoT dataset, which includes legitimate IoT network traffic and several types of attacks. Experiments show that the proposed hybrid IDS provide higher detection rate and lower false positive rate compared to the SIDS and AIDS techniques. © 2019 by the authors. Licensee MDPI, Basel, Switzerland.

Quick View

A polynomial ring construction for the classification of data

- Kelarev, Andrei, Yearwood, John, Vamplew, Peter

Authors: Kelarev, Andrei , Yearwood, John , Vamplew, Peter
Date: 2009
Type: Text , Journal article
Relation: Bulletin of the Australian Mathematical Society Vol. 79, no. 2 (2009), p. 213-225
Full Text:
Reviewed:
Description: Drensky and Lakatos (Lecture Notes in Computer Science, 357 (Springer, Berlin, 1989), pp. 181-188) have established a convenient property of certain ideals in polynomial quotient rings, which can now be used to determine error-correcting capabilities of combined multiple classifiers following a standard approach explained in the well-known monograph by Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques (Elsevier, Amsterdam, 2005)). We strengthen and generalise the result of Drensky and Lakatos by demonstrating that the corresponding nice property remains valid in a much larger variety of constructions and applies to more general types of ideals. Examples show that our theorems do not extend to larger classes of ring constructions and cannot be simplified or generalised.

Quick View

A practical guide to multi-objective reinforcement learning and planning

- Hayes, Conor, Rădulescu, Roxana, Bargiacchi, Eugenio, Källström, Johan, Macfarlane, Matthew, Reymond, Mathieu, Verstraeten, Timothy, Zintgraf, Luisa, Dazeley, Richard, Heintz, Frederick, Howley, Enda, Irissappane, Athirai, Mannion, Patrick, Nowé, Ann, Ramos, Gabriel, Restelli, Marcello, Vamplew, Peter, Roijers, Diederik

Authors: Hayes, Conor , Rădulescu, Roxana , Bargiacchi, Eugenio , Källström, Johan , Macfarlane, Matthew , Reymond, Mathieu , Verstraeten, Timothy , Zintgraf, Luisa , Dazeley, Richard , Heintz, Frederick , Howley, Enda , Irissappane, Athirai , Mannion, Patrick , Nowé, Ann , Ramos, Gabriel , Restelli, Marcello , Vamplew, Peter , Roijers, Diederik
Date: 2022
Type: Text , Journal article
Relation: Autonomous Agents and Multi-Agent Systems Vol. 36, no. 1 (2022), p.
Full Text:
Reviewed:
Description: Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s).

A prioritized objective actor-critic method for deep reinforcement learning

- Nguyen, Ngoc, Nguyen, Thanh, Vamplew, Peter, Dazeley, Richard, Nahavandi, Saeid

Authors: Nguyen, Ngoc , Nguyen, Thanh , Vamplew, Peter , Dazeley, Richard , Nahavandi, Saeid
Date: 2021
Type: Text , Journal article
Relation: Neural Computing and Applications Vol. 33, no. 16 (2021), p. 10335-10349
Full Text: false
Reviewed:
Description: An increasing number of complex problems have naturally posed significant challenges in decision-making theory and reinforcement learning practices. These problems often involve multiple conflicting reward signals that inherently cause agents’ poor exploration in seeking a specific goal. In extreme cases, the agent gets stuck in a sub-optimal solution and starts behaving harmfully. To overcome such obstacles, we introduce two actor-critic deep reinforcement learning methods, namely Multi-Critic Single Policy (MCSP) and Single Critic Multi-Policy (SCMP), which can adjust agent behaviors to efficiently achieve a designated goal by adopting a weighted-sum scalarization of different objective functions. In particular, MCSP creates a human-centric policy that corresponds to a predefined priority weight of different objectives. Whereas, SCMP is capable of generating a mixed policy based on a set of priority weights, i.e., the generated policy uses the knowledge of different policies (each policy corresponds to a priority weight) to dynamically prioritize objectives in real time. We examine our methods by using the Asynchronous Advantage Actor-Critic (A3C) algorithm to utilize the multithreading mechanism for dynamically balancing training intensity of different policies into a single network. Finally, simulation results show that MCSP and SCMP significantly outperform A3C with respect to the mean of total rewards in two complex problems: Food Collector and Seaquest. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd. part of Springer Nature.

Quick View

A survey of multi-objective sequential decision-making

- Roijers, Diederik, Vamplew, Peter, Whiteson, Shimon, Dazeley, Richard

Authors: Roijers, Diederik , Vamplew, Peter , Whiteson, Shimon , Dazeley, Richard
Date: 2013
Type: Text , Journal article
Relation: Journal of Artificial Intelligence Research Vol. 48, no. (2013), p. 67-113
Full Text:
Reviewed:
Description: Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work. Â© 2013 AI Access Foundation.
Description: C1

A taxonomy of griefer type by motivation in massively multiplayer online role-playing games

- Achterbosch, Leigh, Miller, Charlynn, Vamplew, Peter

Authors: Achterbosch, Leigh , Miller, Charlynn , Vamplew, Peter
Date: 2017
Type: Text , Journal article
Relation: Behaviour and Information Technology Vol. 36, no. 8 (2017), p. 846-860
Full Text: false
Reviewed:
Description: There is an anti-social phenomenon known as griefing that occurs in online games. Griefing refers to the act of one player intentionally disrupting another player’s game experience for personal pleasure and possibly potential gain. Achterbosch [2015. “Causes, Magnitude and Implications of Griefing in Massively Multiplayer Online Role-Playing Games.” PhD thesis, Faculty of Science and Technology, Federation University Australia] carried out a substantial two-phase mixed method investigation into the behaviour and experiences of both griefers and griefed players in massively multiplayer online role-playing games. The first phase consisted of a survey that attracted 1188 participants of a representative player population. The second phase consisted of interviews with 15 participants to expand the findings with more personalised data. The data were analysed from the perspectives of different demographics and different associations to griefing. One of the most unique findings is the factors that motivated a player to cause grief to another player. This paper analyses these factors to propose a taxonomy of ‘Griefer’ types (griefer being the individual who imposes upon others). The taxonomy consisted of eight types of griefers, based on their motivation for griefing. Some types related to previous studies, although new types of griefers were discovered such as the retaliator and elitist and these are discussed in detail in the article. © 2017 Informa UK Limited, trading as Taylor & Francis Group.

Quick View

AI apology : interactive multi-objective reinforcement learning for human-aligned AI

- Harland, Hadassah, Dazeley, Richard, Nakisa, Bahareh, Cruz, Francisco, Vamplew, Peter

Authors: Harland, Hadassah , Dazeley, Richard , Nakisa, Bahareh , Cruz, Francisco , Vamplew, Peter
Date: 2023
Type: Text , Journal article
Relation: Neural Computing and Applications Vol. 35, no. 23 (2023), p. 16917-16930
Full Text:
Reviewed:
Description: For an Artificially Intelligent (AI) system to maintain alignment between human desires and its behaviour, it is important that the AI account for human preferences. This paper proposes and empirically evaluates the first approach to aligning agent behaviour to human preference via an apologetic framework. In practice, an apology may consist of an acknowledgement, an explanation and an intention for the improvement of future behaviour. We propose that such an apology, provided in response to recognition of undesirable behaviour, is one way in which an AI agent may both be transparent and trustworthy to a human user. Furthermore, that behavioural adaptation as part of apology is a viable approach to correct against undesirable behaviours. The Act-Assess-Apologise framework potentially could address both the practical and social needs of a human user, to recognise and make reparations against prior undesirable behaviour and adjust for the future. Applied to a dual-auxiliary impact minimisation problem, the apologetic agent had a near perfect determination and apology provision accuracy in several non-trivial configurations. The agent subsequently demonstrated behaviour alignment with success that included up to complete avoidance of the impacts described by these objectives in some scenarios. © 2023, The Author(s).

Quick View

An evaluation methodology for interactive reinforcement learning with simulated users

- Bignold, Adam, Cruz, Francisco, Dazeley, Richard, Vamplew, Peter, Foale, Cameron

Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
Date: 2021
Type: Text , Journal article
Relation: Biomimetics Vol. 6, no. 1 (2021), p. 1-15
Full Text:
Reviewed:
Description: Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.

An online scalarization multi-objective reinforcement learning algorithm : TOPSIS Q-learning

- Mirzanejad, Mohammad, Ebrahimi, Morteza, Vamplew, Peter, Veisi, Hadi

Authors: Mirzanejad, Mohammad , Ebrahimi, Morteza , Vamplew, Peter , Veisi, Hadi
Date: 2022
Type: Text , Journal article
Relation: Knowledge Engineering Review Vol. 37, no. 4 (2022), p.
Full Text: false
Reviewed:
Description: Conventional reinforcement learning focuses on problems with single objective. However, many problems have multiple objectives or criteria that may be independent, related, or contradictory. In such cases, multi-objective reinforcement learning is used to propose a compromise among the solutions to balance the objectives. TOPSIS is a multi-criteria decision method that selects the alternative with minimum distance from the positive ideal solution and the maximum distance from the negative ideal solution, so it can be used effectively in the decision-making process to select the next action. In this research a single-policy algorithm called TOPSIS Q-Learning is provided with focus on its performance in online mode. Unlike all single-policy methods, in the first version of the algorithm, there is no need for the user to specify the weights of the objectives. The user's preferences may not be completely definite, so all weight preferences are combined together as decision criteria and a solution is generated by considering all these preferences at once and user can model the uncertainty and weight changes of objectives around their specified preferences of objectives. If the user only wants to apply the algorithm for a specific set of weights the second version of the algorithm efficiently accomplishes that. ©

Automated opinion detection : Implications of the level of agreement between human raters

- Osman, Deanna, Yearwood, John, Vamplew, Peter

Authors: Osman, Deanna , Yearwood, John , Vamplew, Peter
Date: 2010
Type: Text , Journal article
Relation: Information Processing and Management Vol. 46, no. 3 (2010), p. 331-342
Full Text: false
Reviewed:
Description: The ability to agree with the TREC Blog06 opinion assessments was measured for seven human assessors and compared with the submitted results of the Blog06 participants. The assessors achieved a fair level of agreement between their assessments, although the range between the assessors was large. It is recommended that multiple assessors are used to assess opinion data, or a pre-test of assessors is completed to remove the most dissenting assessors from a pool of assessors prior to the assessment process. The possibility of inconsistent assessments in a corpus also raises concerns about training data for an automated opinion detection system (AODS), so a further recommendation is that AODS training data be assembled from a variety of sources. This paper establishes an aspirational value for an AODS by determining the level of agreement achievable by human assessors when assessing the existence of an opinion on a given topic. Knowing the level of agreement amongst humans is important because it sets an upper bound on the expected performance of AODS. While the AODSs surveyed achieved satisfactory results, none achieved a result close to the upper bound. Â© 2009 Elsevier Ltd. All rights reserved.

Quick View

Detecting K-complexes for sleep stage identification using nonsmooth optimization

- Moloney, David, Sukhorukova, Nadezda, Vamplew, Peter, Ugon, Julien, Li, Gang, Beliakov, Gleb, Philippe, Carole, Amiel, Hélène, Ugon, Adrien

Authors: Moloney, David , Sukhorukova, Nadezda , Vamplew, Peter , Ugon, Julien , Li, Gang , Beliakov, Gleb , Philippe, Carole , Amiel, Hélène , Ugon, Adrien
Date: 2012
Type: Text , Journal article
Relation: ANZIAM Journal Vol. 52, no. 4 (2012), p. 319-332
Full Text:
Reviewed:
Description: The process of sleep stage identification is a labour-intensive task that involves the specialized interpretation of the polysomnographic signals captured from a patient's overnight sleep session. Automating this task has proven to be challenging for data mining algorithms because of noise, complexity and the extreme size of data. In this paper we apply nonsmooth optimization to extract key features that lead to better accuracy. We develop a specific procedure for identifying K-complexes, a special type of brain wave crucial for distinguishing sleep stages. The procedure contains two steps. We first extract "easily classified" K-complexes, and then apply nonsmooth optimization methods to extract features from the remaining data and refine the results from the first step. Numerical experiments show that this procedure is efficient for detecting K-complexes. It is also found that most classification methods perform significantly better on the extracted features. © 2012 Australian Mathematical Society.

Quick View

Discrete-to-deep reinforcement learning methods

- Kurniawan, Budi, Vamplew, Peter, Papasimeon, Michael, Dazeley, Richard, Foale, Cameron

Authors: Kurniawan, Budi , Vamplew, Peter , Papasimeon, Michael , Dazeley, Richard , Foale, Cameron
Date: 2022
Type: Text , Journal article
Relation: Neural Computing and Applications Vol. 34, no. 3 (2022), p. 1713-1733
Full Text:
Reviewed:
Description: Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. In complex problems, a neural RL approach is often able to learn a better solution than tabular RL, but generally takes longer. This paper proposes two methods, Discrete-to-Deep Supervised Policy Learning (D2D-SPL) and Discrete-to-Deep Supervised Q-value Learning (D2D-SQL), whose objective is to acquire the generalisability of a neural network at a cost nearer to that of a tabular method. Both methods combine RL and supervised learning (SL) and are based on the idea that a fast-learning tabular method can generate off-policy data to accelerate learning in neural RL. D2D-SPL uses the data to train a classifier which is then used as a controller for the RL problem. D2D-SQL uses the data to initialise a neural network which is then allowed to continue learning using another RL method. We demonstrate the viability of our algorithms with Cartpole, Lunar Lander and an aircraft manoeuvring problem, three continuous-space environments with low-dimensional state variables. Both methods learn at least 38% faster than baseline methods and yield policies that outperform them. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.

Empirical evaluation methods for multiobjective reinforcement learning algorithms

- Vamplew, Peter, Dazeley, Richard, Berry, Adam, Issabekov, Rustam, Dekker, Evan

Authors: Vamplew, Peter , Dazeley, Richard , Berry, Adam , Issabekov, Rustam , Dekker, Evan
Date: 2011
Type: Text , Journal article
Relation: Machine Learning Vol. 84, no. 1-2 (2011), p. 51-80
Full Text: false
Reviewed:
Description: While a number of algorithms for multiobjective reinforcement learning have been proposed, and a small number of applications developed, there has been very little rigorous empirical evaluation of the performance and limitations of these algorithms. This paper proposes standard methods for such empirical evaluation, to act as a foundation for future comparative studies. Two classes of multiobjective reinforcement learning algorithms are identified, and appropriate evaluation metrics and methodologies are proposed for each class. A suite of benchmark problems with known Pareto fronts is described, and future extensions and implementations of this benchmark suite are discussed. The utility of the proposed evaluation methods are demonstrated via an empirical comparison of two example learning algorithms. Â© 2010 The Author(s).

Quick View

Explainable reinforcement learning for broad-XAI: a conceptual framework and survey

- Dazeley, Richard, Vamplew, Peter, Cruz, Francisco

Authors: Dazeley, Richard , Vamplew, Peter , Cruz, Francisco
Date: 2023
Type: Text , Journal article
Relation: Neural Computing and Applications Vol. 35, no. 23 (2023), p. 16893-16916
Full Text:
Reviewed:
Description: Broad-XAI moves away from interpreting individual decisions based on a single datum and aims to provide integrated explanations from multiple machine learning algorithms into a coherent explanation of an agent’s behaviour that is aligned to the communication needs of the explainee. Reinforcement Learning (RL) methods, we propose, provide a potential backbone for the cognitive model required for the development of Broad-XAI. RL represents a suite of approaches that have had increasing success in solving a range of sequential decision-making problems. However, these algorithms operate as black-box problem solvers, where they obfuscate their decision-making policy through a complex array of values and functions. EXplainable RL (XRL) aims to develop techniques to extract concepts from the agent’s: perception of the environment; intrinsic/extrinsic motivations/beliefs; Q-values, goals and objectives. This paper aims to introduce the Causal XRL Framework (CXF), that unifies the current XRL research and uses RL as a backbone to the development of Broad-XAI. CXF is designed to incorporate many standard RL extensions and integrated with external ontologies and communication facilities so that the agent can answer questions that explain outcomes its decisions. This paper aims to: establish XRL as a distinct branch of XAI; introduce a conceptual framework for XRL; review existing approaches explaining agent behaviour; and identify opportunities for future research. Finally, this paper discusses how additional information can be extracted and ultimately integrated into models of communication, facilitating the development of Broad-XAI. © 2023, The Author(s).

Quick View

Explainable robotic systems : understanding goal-driven actions in a reinforcement learning scenario

- Cruz, Francisco, Dazeley, Richard, Vamplew, Peter, Moreira, Ithan

Authors: Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Moreira, Ithan
Date: 2023
Type: Text , Journal article
Relation: Neural Computing and Applications Vol. 35, no. 25 (2023), p. 18113-18130
Full Text:
Reviewed:
Description: Robotic systems are more present in our society everyday. In human–robot environments, it is crucial that end-users may correctly understand their robotic team-partners, in order to collaboratively complete a task. To increase action understanding, users demand more explainability about the decisions by the robot in particular situations. Recently, explainable robotic systems have emerged as an alternative focused not only on completing a task satisfactorily, but also on justifying, in a human-like manner, the reasons that lead to making a decision. In reinforcement learning scenarios, a great effort has been focused on providing explanations using data-driven approaches, particularly from the visual input modality in deep learning-based systems. In this work, we focus rather on the decision-making process of reinforcement learning agents performing a task in a robotic scenario. Experimental results are obtained using 3 different set-ups, namely, a deterministic navigation task, a stochastic navigation task, and a continuous visual-based sorting object task. As a way to explain the goal-driven robot’s actions, we use the probability of success computed by three different proposed approaches: memory-based, learning-based, and introspection-based. The difference between these approaches is the amount of memory required to compute or estimate the probability of success as well as the kind of reinforcement learning representation where they could be used. In this regard, we use the memory-based approach as a baseline since it is obtained directly from the agent’s observations. When comparing the learning-based and the introspection-based approaches to this baseline, both are found to be suitable alternatives to compute the probability of success, obtaining high levels of similarity when compared using both the Pearson’s correlation and the mean squared error. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.

Quick View

Function similarity using family context

- Black, Paul, Gondal, Iqbal, Vamplew, Peter, Lakhotia, Arun

Authors: Black, Paul , Gondal, Iqbal , Vamplew, Peter , Lakhotia, Arun
Date: 2020
Type: Text , Journal article
Relation: Electronics Vol. 9, no. 7 (Jul 2020), p. 20
Full Text:
Reviewed:
Description: Finding changed and similar functions between a pair of binaries is an important problem in malware attribution and for the identification of new malware capabilities. This paper presents a new technique called Function Similarity using Family Context (FSFC) for this problem. FSFC trains a Support Vector Machine (SVM) model using pairs of similar functions from two program variants. This method improves upon previous research called Cross Version Contextual Function Similarity (CVCFS) e epresenting a function using features extracted not just from the function itself, but also, from other functions with which it has a caller and callee relationship. We present the results of an initial experiment that shows that the use of additional features from the context of a function significantly decreases the false positive rate, obviating the need for a separate pass for cleaning false positives. The more surprising and unexpected finding is that the SVM model produced by FSFC can abstract function similarity features from one pair of program variants to find similar functions in an unrelated pair of program variants. If validated by a larger study, this new property leads to the possibility of creating generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra.
Description: This research was performed in the Internet Commerce Security Lab (ICSL), which is a joint venture with research partners Westpac, IBM, and Federation University Australia.

Quick View

Griefers versus the Griefed - what motivates them to play Massively Multiplayer Online Role-Playing Games?

- Achterbosch, Leigh, Miller, Charlynn, Turville, Christopher, Vamplew, Peter

Authors: Achterbosch, Leigh , Miller, Charlynn , Turville, Christopher , Vamplew, Peter
Date: 2014
Type: Text , Journal article
Relation: The Computer Games Journal Vol. 3, no. 1 (2014), p. 5-18
Full Text:
Reviewed:
Description: 'Griefing' is a term used to describe when a player within a multiplayer online environment intentionally disrupts another player’s game experience for his or her own personal enjoyment or gain. Every day a certain percentage of users of Massively Multiplayer Online Role-Playing Games (MMORPG) are experiencing some form of griefing. There have been studies conducted in the past that attempted to ascertain the factors that motivate users to play MMORPGs. A limited number of studies specifically examined the motivations of users who perform griefing (who are also known as 'griefers'). However, those studies did not examine the motivations of users subjected to griefing. Therefore, the aim of this paper is to examine the factors that motivate the subjects of griefing to play MMORPGs, as well as the factors motivating the griefers. The authors conducted an online survey with the intention to discover the motivations for playing MMORPGs among those whom identified themselves as (i) those that perform griefing, and (ii) those who have been subjected to griefing. A previously devised motivational model by Nick Yee that incorporated ten factors was used to determine the respondents’ motivational trends. In general, players who identified themselves as griefers were more likely to be motivated by all three 'achievement' sub-factors (advancement, game mechanics and competition) at the detriment of all other factors. The subjects of griefing were highly motivated by 'advancement' and 'mechanics', but they ranked 'competition' significantly lower (compared to the griefers). In addition, 'immersion' factors were rated highly by the respondents who were subjected to griefing, with a significantly higher rating of the 'escapism' factor (compared with rankings by griefers). In comparison to the griefers, the respondents subjected to griefing with many years’ experience in the genre of MMORPGs, also placed a greater emphasis on the 'socializing' and 'relationship' factors. Overall, the griefers in this survey considered 'achievement' to be a prime motivating factor, whereas the griefed players tended to be motivated by all ten factors to a similar degree.

Showing items 1 - 20 of 37