List of Titles

Showing items 1 - 20 of 34

Your selections:

On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts

- Vamplew, Peter, Yearwood, John, Dazeley, Richard, Berry, Adam

Authors: Vamplew, Peter , Yearwood, John , Dazeley, Richard , Berry, Adam
Date: 2008
Type: Text , Conference paper
Relation: Paper presented at 21st Australasian Joint Conference on Artificial Intelligence, Auckland, New Zealand : 1st-5th December 2008 Vol. 5360, p. 372-378
Full Text: false
Description: Multiobjective reinforcement learning (MORL) extends RL to problems with multiple conflicting objectives. This paper argues for designing MORL systems to produce a set of solutions approximating the Pareto front, and shows that the common MORL technique of scalarisation has fundamental limitations when used to find Pareto-optimal policies. The work is supported by the presentation of three new MORL benchmarks with known Pareto fronts.
Description: 2003006504

Quick View

Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks

- Vamplew, Peter, Dazeley, Richard, Barker, Ewan, Kelarev, Andrei

Authors: Vamplew, Peter , Dazeley, Richard , Barker, Ewan , Kelarev, Andrei
Date: 2009
Type: Text , Book chapter
Relation: AI 2009 : Advances in Artificial Intelligence : 22nd Australasian Joint Conference, Melbourne, Australia, December 1-4, 2009. Proceedings Chapter p. 340-349
Full Text:
Description: Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problems with multiple conflicting objectives. This paper discusses the advantages gained from applying stochastic policies to multiobjective tasks and examines a particular form of stochastic policy known as a mixture policy. Two methods are proposed for deriving mixture policies for episodic multiobjective tasks from deterministic base policies found via scalarised reinforcement learning. It is shown that these approaches are an efficient means of identifying solutions which offer a superior match to the user’s preferences than can be achieved by methods based strictly on deterministic policies.
Description: 2003007906

Quick View

A survey of multi-objective sequential decision-making

- Roijers, Diederik, Vamplew, Peter, Whiteson, Shimon, Dazeley, Richard

Authors: Roijers, Diederik , Vamplew, Peter , Whiteson, Shimon , Dazeley, Richard
Date: 2013
Type: Text , Journal article
Relation: Journal of Artificial Intelligence Research Vol. 48, no. (2013), p. 67-113
Full Text:
Reviewed:
Description: Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work. Â© 2013 AI Access Foundation.
Description: C1

Empirical evaluation methods for multiobjective reinforcement learning algorithms

- Vamplew, Peter, Dazeley, Richard, Berry, Adam, Issabekov, Rustam, Dekker, Evan

Authors: Vamplew, Peter , Dazeley, Richard , Berry, Adam , Issabekov, Rustam , Dekker, Evan
Date: 2011
Type: Text , Journal article
Relation: Machine Learning Vol. 84, no. 1-2 (2011), p. 51-80
Full Text: false
Reviewed:
Description: While a number of algorithms for multiobjective reinforcement learning have been proposed, and a small number of applications developed, there has been very little rigorous empirical evaluation of the performance and limitations of these algorithms. This paper proposes standard methods for such empirical evaluation, to act as a foundation for future comparative studies. Two classes of multiobjective reinforcement learning algorithms are identified, and appropriate evaluation metrics and methodologies are proposed for each class. A suite of benchmark problems with known Pareto fronts is described, and future extensions and implementations of this benchmark suite are discussed. The utility of the proposed evaluation methods are demonstrated via an empirical comparison of two example learning algorithms. Â© 2010 The Author(s).

Quick View

The ballarat incremental knowledge engine

- Dazeley, Richard, Warner, Philip, Johnson, Scott, Vamplew, Peter

Authors: Dazeley, Richard , Warner, Philip , Johnson, Scott , Vamplew, Peter
Date: 2010
Type: Text , Conference paper
Relation: Paper pressented at 11th International Workshop on Knowledge Management and Acquisition for Smart Systems and Services, PKAW 2010 Vol. 6232 LNAI, p. 195-207
Full Text:
Reviewed:
Description: Ripple Down Rules (RDR) is a maturing collection of methodologies for the incremental development and maintenance of medium to large rule-based knowledge systems. While earlier knowledge based systems relied on extensive modeling and knowledge engineering, RDR instead takes a simple no-model approach that merges the development and maintenance stages. Over the last twenty years RDR has been significantly expanded and applied in numerous domains. Until now researchers have generally implemented their own version of the methodologies, while commercial implementations are not made available. This has resulted in much duplicated code and the advantages of RDR not being available to a wider audience. The aim of this project is to develop a comprehensive and extensible platform that supports current and future RDR technologies, thereby allowing researchers and developers access to the power and versatility of RDR. This paper is a report on the current status of the project and marks the first release of the software. Â© 2010 Springer-Verlag Berlin Heidelberg.

RM and RDM, a preliminary evaluation of two prudent RDR Techniques

- Maruatona, Omaru, Vamplew, Peter, Dazeley, Richard

Authors: Maruatona, Omaru , Vamplew, Peter , Dazeley, Richard
Date: 2012
Type: Text , Book chapter
Relation: Knowledge Management and acquisition for intelligent systems: 12th Pacific Rim Knowledge Acquisition workshop p. 188-194
Full Text: false
Reviewed:
Description: Rated Multiple Classification Ripple Down Rules (RM) and Ripple Down Models (RDM) are two of the successful prudent RDR approaches published. To date, there has not been a published, dedicated comparison of the two. This paper presents a systematic preliminary evaluation and analysis of the two techniques. The tests and results reported in this paper are the first phase of direct evaluations of RM and RDM against each other.

Quick View

Prudent fraud detection in internet banking

- Maruatona, Omaru, Vamplew, Peter, Dazeley, Richard

Authors: Maruatona, Omaru , Vamplew, Peter , Dazeley, Richard
Date: 2012
Type: Text , Conference proceedings
Full Text:
Description: Most commercial Fraud Detection components of Internet banking systems use some kind of hybrid setup usually comprising a Rule-Base and an Artificial Neural Network. Such rule bases have been criticised for a lack of innovation in their approach to Knowledge Acquisition and maintenance. Furthermore, the systems are brittle; they have no way of knowing when a previously unseen set of fraud patterns is beyond their current knowledge. This limitation may have far reaching consequences in an online banking system. This paper presents a viable alternative to brittleness in Knowledge Based Systems; a potential milestone in the rapid detection of unique and novel fraud patterns in Internet banking. The experiments conducted with real online banking transaction log files suggest that Prudent based fraud detection may be a worthy alternative in online banking. © 2012 IEEE.
Description: 2003010883

Quick View

Steering approaches to Pareto-optimal multiobjective reinforcement learning

- Vamplew, Peter, Issabekov, Rustam, Dazeley, Richard, Foale, Cameron, Berry, Adam, Moore, Tim, Creighton, Douglas

Authors: Vamplew, Peter , Issabekov, Rustam , Dazeley, Richard , Foale, Cameron , Berry, Adam , Moore, Tim , Creighton, Douglas
Date: 2017
Type: Text , Journal article
Relation: Neurocomputing Vol. 263, no. (2017), p. 26-38
Full Text:
Reviewed:
Description: For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent’s target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system.

Coarse Q-Learning : Addressing the convergence problem when quantizing continuous state variables

- Dazeley, Richard, Vamplew, Peter, Bignold, Adam

Authors: Dazeley, Richard , Vamplew, Peter , Bignold, Adam
Date: 2015
Type: Text , Conference paper
Relation: 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making
Full Text: false
Reviewed:
Description: Value-based approaches to reinforcement learning (RL) maintain a value function that measures the long term utility of a state or state-action pair. A long standing issue in RL is how to create a finite representation in a continuous, and therefore infinite, state environment. The common approach is to use function approximators such as tile coding, memory or instance based methods. These provide some balance between generalisation, resolution, and storage, but converge slowly in multidimensional state environments. Another approach of quantizing state into lookup tables has been commonly regarded as highly problematic, due to large memory requirements and poor generalisation. In particular , attempting to reduce memory requirements and increase generalisation by using coarser quantization forms a non-Markovian system that does not converge. This paper investigates the problem in using quantized lookup tables and presents an extension to the Q-Learning algorithm, referred to as Coarse Q-Learning (C QL), which resolves these issues. The presented algorithm will be shown to drastically reduce the memory requirements and increase generalisation by simulating the Markov property. In particular, this algorithm means the size of the input space is determined by the granularity required by the policy being learnt, rather than by the inadequacies of the learning algorithm or the nature of the state-reward dynamics of the environment. Importantly, the method presented solves the problem represented by the curse of dimensionality.

Quick View

Non-functional regression : A new challenge for neural networks

- Vamplew, Peter, Dazeley, Richard, Foale, Cameron, Choudhury, Tanveer

Authors: Vamplew, Peter , Dazeley, Richard , Foale, Cameron , Choudhury, Tanveer
Date: 2018
Type: Text , Journal article
Relation: Neurocomputing Vol. 314, no. (2018), p. 326-335
Full Text:
Reviewed:
Description: This work identifies an important, previously unaddressed issue for regression based on neural networks – learning to accurately approximate problems where the output is not a function of the input (i.e. where the number of outputs required varies across input space). Such non-functional regression problems arise in a number of applications, and can not be adequately handled by existing neural network algorithms. To demonstrate the benefits possible from directly addressing non-functional regression, this paper proposes the first neural algorithm to do so – an extension of the Resource Allocating Network (RAN) which adds additional output neurons to the network structure during training. This new algorithm, called the Resource Allocating Network with Varying Output Cardinality (RANVOC), is demonstrated to be capable of learning to perform non-functional regression, on both artificially constructed data and also on the real-world task of specifying parameter settings for a plasma-spray process. Importantly RANVOC is shown to outperform not just the original RAN algorithm, but also the best possible error rates achievable by any functional form of regression.

Quick View

Human-aligned artificial intelligence is a multiobjective problem

- Vamplew, Peter, Dazeley, Richard, Foale, Cameron, Firmin, Sally, Mummery, Jane

Authors: Vamplew, Peter , Dazeley, Richard , Foale, Cameron , Firmin, Sally , Mummery, Jane
Date: 2018
Type: Text , Journal article
Relation: Ethics and Information Technology Vol. 20, no. 1 (2018), p. 27-40
Full Text:
Reviewed:
Description: As the capabilities of artificial intelligence (AI) systems improve, it becomes important to constrain their actions to ensure their behaviour remains beneficial to humanity. A variety of ethical, legal and safety-based frameworks have been proposed as a basis for designing these constraints. Despite their variations, these frameworks share the common characteristic that decision-making must consider multiple potentially conflicting factors. We demonstrate that these alignment frameworks can be represented as utility functions, but that the widely used Maximum Expected Utility (MEU) paradigm provides insufficient support for such multiobjective decision-making. We show that a Multiobjective Maximum Expected Utility paradigm based on the combination of vector utilities and non-linear action–selection can overcome many of the issues which limit MEU’s effectiveness in implementing aligned AI. We examine existing approaches to multiobjective AI, and identify how these can contribute to the development of human-aligned intelligent agents. © 2017, Springer Science+Business Media B.V.

Quick View

Softmax exploration strategies for multiobjective reinforcement learning

- Vamplew, Peter, Dazeley, Richard, Foale, Cameron

Authors: Vamplew, Peter , Dazeley, Richard , Foale, Cameron
Date: 2017
Type: Text , Journal article
Relation: Neurocomputing Vol. 263, no. (2017), p. 74-86
Full Text:
Reviewed:
Description: Despite growing interest over recent years in applying reinforcement learning to multiobjective problems, there has been little research into the applicability and effectiveness of exploration strategies within the multiobjective context. This work considers several widely-used approaches to exploration from the single-objective reinforcement learning literature, and examines their incorporation into multiobjective Q-learning. In particular this paper proposes two novel approaches which extend the softmax operator to work with vector-valued rewards. The performance of these exploration strategies is evaluated across a set of benchmark environments. Issues arising from the multiobjective formulation of these benchmarks which impact on the performance of the exploration strategies are identified. It is shown that of the techniques considered, the combination of the novel softmax–epsilon exploration with optimistic initialisation provides the most effective trade-off between exploration and exploitation.

Reinforcement learning of pareto-optimal multiobjective policies using steering

- Vamplew, Peter, Issabekov, Rustam, Dazeley, Richard, Foale, Cameron

Authors: Vamplew, Peter , Issabekov, Rustam , Dazeley, Richard , Foale, Cameron
Date: 2015
Type: Text , Conference paper
Relation: 28th Australasian Joint Conference on Artificial Intelligence, AI 2015; Canberra, ACT; 30th November-4th December 2015 Vol. 9457, p. 596-608
Full Text: false
Reviewed:
Description: There has been little research into multiobjective reinforcement learning (MORL) algorithms using stochastic or non-stationary policies, even though such policies may Pareto-dominate deterministic stationary policies. One approach is steering which forms a nonstationary combination of deterministic stationary base policies. This paper presents two new steering algorithms designed for the task of learning Pareto-optimal policies. The first algorithm (w-steering) is a direct adaptation of previous approaches to steering, and therefore requires prior knowledge of recurrent states which are guaranteed to be revisited. The second algorithm (Q-steering) eliminates this requirement. Empirical results show that both algorithms perform well when given knowledge of recurrent states, but that Q-steering provides substantial performance improvements over w-steering when this knowledge is not available. © Springer International Publishing Switzerland 2015.

Quick View

Levels of explainable artificial intelligence for human-aligned conversational explanations

- Dazeley, Richard, Vamplew, Peter, Foale, Cameron, Young, Cameron, Aryal, Sunil, Cruz, Francisco

Authors: Dazeley, Richard , Vamplew, Peter , Foale, Cameron , Young, Cameron , Aryal, Sunil , Cruz, Francisco
Date: 2021
Type: Text , Journal article
Relation: Artificial Intelligence Vol. 299, no. (2021), p.
Full Text:
Reviewed:
Description: Over the last few years there has been rapid research growth into eXplainable Artificial Intelligence (XAI) and the closely aligned Interpretable Machine Learning (IML). Drivers for this growth include recent legislative changes and increased investments by industry and governments, along with increased concern from the general public. People are affected by autonomous decisions every day and the public need to understand the decision-making process to accept the outcomes. However, the vast majority of the applications of XAI/IML are focused on providing low-level ‘narrow’ explanations of how an individual decision was reached based on a particular datum. While important, these explanations rarely provide insights into an agent's: beliefs and motivations; hypotheses of other (human, animal or AI) agents' intentions; interpretation of external cultural expectations; or, processes used to generate its own explanation. Yet all of these factors, we propose, are essential to providing the explanatory depth that people require to accept and trust the AI's decision-making. This paper aims to define levels of explanation and describe how they can be integrated to create a human-aligned conversational explanation system. In so doing, this paper will survey current approaches and discuss the integration of different technologies to achieve these levels with Broad eXplainable Artificial Intelligence (Broad-XAI), and thereby move towards high-level ‘strong’ explanations. © 2021 Elsevier B.V.

Quick View

An evaluation methodology for interactive reinforcement learning with simulated users

- Bignold, Adam, Cruz, Francisco, Dazeley, Richard, Vamplew, Peter, Foale, Cameron

Authors: Bignold, Adam , Cruz, Francisco , Dazeley, Richard , Vamplew, Peter , Foale, Cameron
Date: 2021
Type: Text , Journal article
Relation: Biomimetics Vol. 6, no. 1 (2021), p. 1-15
Full Text:
Reviewed:
Description: Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.

Rapid anomaly detection using integrated prudence analysis (IPA)

- Maruatona, Omaru, Vamplew, Peter, Dazeley, Richard, Watters, Paul

Authors: Maruatona, Omaru , Vamplew, Peter , Dazeley, Richard , Watters, Paul
Date: 2018
Type: Text , Conference proceedings
Relation: PAKDD 2018.Trends and Applications in Knowledge Discovery and Data Mining. p. 137-141
Full Text: false
Reviewed:
Description: Integrated Prudence Analysis has been proposed as a method to maximize the accuracy of rule based systems. The paper presents evaluation results of the three Prudence methods on public datasets which demonstrate that combining attribute-based and structural Prudence produces a net improvement in Prudence Accuracy.

Quick View

The impact of environmental stochasticity on value-based multiobjective reinforcement learning

- Vamplew, Peter, Foale, Cameron, Dazeley, Richard

Authors: Vamplew, Peter , Foale, Cameron , Dazeley, Richard
Date: 2022
Type: Text , Journal article
Relation: Neural Computing and Applications Vol. 34, no. 3 (2022), p. 1783-1799
Full Text:
Reviewed:
Description: A common approach to address multiobjective problems using reinforcement learning methods is to extend model-free, value-based algorithms such as Q-learning to use a vector of Q-values in combination with an appropriate action selection mechanism that is often based on scalarisation. Most prior empirical evaluation of these approaches has focused on deterministic environments. This study examines the impact on stochasticity in rewards and state transitions on the behaviour of multi-objective Q-learning. It shows that the nature of the optimal solution depends on these environmental characteristics, and also on whether we desire to maximise the Expected Scalarised Return (ESR) or the Scalarised Expected Return (SER). We also identify a novel aim which may arise in some applications of maximising SER subject to satisfying constraints on the variation in return and show that this may require different solutions than ESR or conventional SER. The analysis of the interaction between environmental stochasticity and multi-objective Q-learning is supported by empirical evaluations on several simple multiobjective Markov Decision Processes with varying characteristics. This includes a demonstration of a novel approach to learning deterministic SER-optimal policies for environments with stochastic rewards. In addition, we report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. Having highlighted the limitations of value-based model-free MORL methods, we discuss several alternative methods that may be more suitable for maximising SER in MOMDPs with stochastic transitions. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.

Quick View

Language representations for generalization in reinforcement learning

- Goodger, Nikolaj, Vamplew, Peter, Foale, Cameron, Dazeley, Richard

Authors: Goodger, Nikolaj , Vamplew, Peter , Foale, Cameron , Dazeley, Richard
Date: 2021
Type: Text , Conference paper
Relation: 13th Asian Conference on Machine Learning, Virtual, 17-19 November 2021, Proceedings of The 13th Asian Conference on Machine Learning Vol. 157, p. 390-405
Full Text:
Reviewed:
Description: The choice of state and action representation in Reinforcement Learning (RL) has a significant effect on agent performance for the training task. But its relationship with generalization to new tasks is under-explored. One approach to improving generalization investigated here is the use of language as a representation. We compare vector-states and discreteactions to language representations. We find the agents using language representations generalize better and could solve tasks with more entities, new entities, and more complexity than seen in the training task. We attribute this to the compositionality of language

Quick View

A multi-objective deep reinforcement learning framework

- Nguyen, Thanh, Nguyen, Ngoc, Vamplew, Peter, Nahavandi, Saeid, Dazeley, Richard, Lim, Chee

Authors: Nguyen, Thanh , Nguyen, Ngoc , Vamplew, Peter , Nahavandi, Saeid , Dazeley, Richard , Lim, Chee
Date: 2020
Type: Text , Journal article
Relation: Engineering Applications of Artificial Intelligence Vol. 96, no. (2020), p.
Full Text:
Reviewed:
Description: This paper introduces a new scalable multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We develop a high-performance MODRL framework that supports both single-policy and multi-policy strategies, as well as both linear and non-linear approaches to action selection. The experimental results on two benchmark problems (two-objective deep sea treasure environment and three-objective Mountain Car problem) indicate that the proposed framework is able to find the Pareto-optimal solutions effectively. The proposed framework is generic and highly modularized, which allows the integration of different deep reinforcement learning algorithms in different complex problem domains. This therefore overcomes many disadvantages involved with standard multi-objective reinforcement learning methods in the current literature. The proposed framework acts as a testbed platform that accelerates the development of MODRL for solving increasingly complicated multi-objective problems. © 2020 Elsevier Ltd

A prioritized objective actor-critic method for deep reinforcement learning

- Nguyen, Ngoc, Nguyen, Thanh, Vamplew, Peter, Dazeley, Richard, Nahavandi, Saeid

Authors: Nguyen, Ngoc , Nguyen, Thanh , Vamplew, Peter , Dazeley, Richard , Nahavandi, Saeid
Date: 2021
Type: Text , Journal article
Relation: Neural Computing and Applications Vol. 33, no. 16 (2021), p. 10335-10349
Full Text: false
Reviewed:
Description: An increasing number of complex problems have naturally posed significant challenges in decision-making theory and reinforcement learning practices. These problems often involve multiple conflicting reward signals that inherently cause agents’ poor exploration in seeking a specific goal. In extreme cases, the agent gets stuck in a sub-optimal solution and starts behaving harmfully. To overcome such obstacles, we introduce two actor-critic deep reinforcement learning methods, namely Multi-Critic Single Policy (MCSP) and Single Critic Multi-Policy (SCMP), which can adjust agent behaviors to efficiently achieve a designated goal by adopting a weighted-sum scalarization of different objective functions. In particular, MCSP creates a human-centric policy that corresponds to a predefined priority weight of different objectives. Whereas, SCMP is capable of generating a mixed policy based on a set of priority weights, i.e., the generated policy uses the knowledge of different policies (each policy corresponds to a priority weight) to dynamically prioritize objectives in real time. We examine our methods by using the Asynchronous Advantage Actor-Critic (A3C) algorithm to utilize the multithreading mechanism for dynamically balancing training intensity of different policies into a single network. Finally, simulation results show that MCSP and SCMP significantly outperform A3C with respect to the mean of total rewards in two complex problems: Food Collector and Seaquest. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd. part of Springer Nature.

1
2

‹ › ×