A multi-objective deep reinforcement learning framework
- Authors: Nguyen, Thanh , Nguyen, Ngoc , Vamplew, Peter , Nahavandi, Saeid , Dazeley, Richard , Lim, Chee
- Date: 2020
- Type: Text , Journal article
- Relation: Engineering Applications of Artificial Intelligence Vol. 96, no. (2020), p.
- Full Text:
- Reviewed:
- Description: This paper introduces a new scalable multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We develop a high-performance MODRL framework that supports both single-policy and multi-policy strategies, as well as both linear and non-linear approaches to action selection. The experimental results on two benchmark problems (two-objective deep sea treasure environment and three-objective Mountain Car problem) indicate that the proposed framework is able to find the Pareto-optimal solutions effectively. The proposed framework is generic and highly modularized, which allows the integration of different deep reinforcement learning algorithms in different complex problem domains. This therefore overcomes many disadvantages involved with standard multi-objective reinforcement learning methods in the current literature. The proposed framework acts as a testbed platform that accelerates the development of MODRL for solving increasingly complicated multi-objective problems. © 2020 Elsevier Ltd
A multiobjective state transition algorithm for single machine scheduling
- Authors: Zhou, Xiaojun , Hanoun, Samer , Gao, David , Nahavandi, Saeid
- Date: 2015
- Type: Text , Conference paper
- Relation: 3rd World Congress on Global Optimization in Engineering and Science, WCGO 2013; Anhui, China; 8th-12th July 2013 Vol. 95, p. 79-88
- Full Text: false
- Reviewed:
- Description: In this paper, a discrete state transition algorithm is introduced to solve a multiobjective single machine job shop scheduling problem. In the proposed approach, a non-dominated sort technique is used to select the best from a candidate state set, and a Pareto archived strategy is adopted to keep all the non-dominated solutions. Compared with the enumeration and other heuristics, experimental results have demonstrated the effectiveness of the multiobjective state transition algorithm. © Springer International Publishing Switzerland 2015.
A prioritized objective actor-critic method for deep reinforcement learning
- Authors: Nguyen, Ngoc , Nguyen, Thanh , Vamplew, Peter , Dazeley, Richard , Nahavandi, Saeid
- Date: 2021
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 33, no. 16 (2021), p. 10335-10349
- Full Text: false
- Reviewed:
- Description: An increasing number of complex problems have naturally posed significant challenges in decision-making theory and reinforcement learning practices. These problems often involve multiple conflicting reward signals that inherently cause agents’ poor exploration in seeking a specific goal. In extreme cases, the agent gets stuck in a sub-optimal solution and starts behaving harmfully. To overcome such obstacles, we introduce two actor-critic deep reinforcement learning methods, namely Multi-Critic Single Policy (MCSP) and Single Critic Multi-Policy (SCMP), which can adjust agent behaviors to efficiently achieve a designated goal by adopting a weighted-sum scalarization of different objective functions. In particular, MCSP creates a human-centric policy that corresponds to a predefined priority weight of different objectives. Whereas, SCMP is capable of generating a mixed policy based on a set of priority weights, i.e., the generated policy uses the knowledge of different policies (each policy corresponds to a priority weight) to dynamically prioritize objectives in real time. We examine our methods by using the Asynchronous Advantage Actor-Critic (A3C) algorithm to utilize the multithreading mechanism for dynamically balancing training intensity of different policies into a single network. Finally, simulation results show that MCSP and SCMP significantly outperform A3C with respect to the mean of total rewards in two complex problems: Food Collector and Seaquest. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd. part of Springer Nature.
Dynamical analysis of neural networks with time-varying delays using the LMI approach
- Authors: Lakshmanan, Shanmugam , Lim, Cheepeng , Bhatti, Asim , Gao, David , Nahavandi, Saeid
- Date: 2015
- Type: Text , Conference paper
- Relation: 22nd International Conference on Neural Information Processing, ICONIP 2015; Istanbul, Turkey; 9th-12th November 2015 Vol. 9491, p. 297-305
- Full Text: false
- Reviewed:
- Description: This study is concerned with the delay-range-dependent stability analysis for neural networks with time-varying delay and Markovian jumping parameters. The time-varying delay is assumed to lie in an interval of lower and upper bounds. The Markovian jumping parameters are introduced in delayed neural networks, which are modeled in a continuous-time along with finite-state Markov chain. Moreover, the sufficient condition is derived in terms of linear matrix inequalities based on appropriate Lyapunov-Krasovskii functionals and stochastic stability theory, which guarantees the globally asymptotic stable condition in the mean square. Finally, a numerical example is provided to validate the effectiveness of the proposed conditions. © Springer International Publishing Switzerland 2015.
Intuitive haptics interface with accurate force estimation and reflection at nanoscale
- Authors: Bhatti, Asim , Khan, Burhan , Nahavandi, Saeid , Hanoun, Samer , Gao, David
- Date: 2015
- Type: Text , Conference paper
- Relation: 3rd World Congress on Global Optimization in Engineering and Science, WCGO 2013; Anhui, China; 8th-12th July 2013 Vol. 95, p. 507-514
- Full Text: false
- Reviewed:
- Description: Technologies, such as Atomic Force Microscopy (AFM), have proven to be one of the most versatile research equipments in the field of nanotechnology by providing physical access to the materials at nanoscale. Working principles of AFM involve physical interaction with the sample at nanometre scale to estimate the topography of the sample surface. Size of the cantilever tip, within the range of few nanometres diameter, and inherent elasticity of the cantilever allow it to bend in response to the changes in the sample surface leading to accurate estimation of the sample topography. Despite the capabilities of the AFM, there is a lack of intuitive user interfaces that could allow interaction with the materials at nanoscale, analogous to the way we are accustomed to at macro level. To bridge this gap of intuitive interface design and development, a haptics interface is designed in conjunction with Bruker Nanos AFM. Interaction with the materials at nanoscale is characterised by estimating the forces experienced by the cantilever tip employing geometric deformation principles. Estimated forces are reflected to the user, in a controlled manner, through haptics interface. Established mathematical framework for force estimation can be adopted for AFM operations in air as well as in liquid mediums. © Springer International Publishing Switzerland 2015.
Patient admission prediction using a pruned fuzzy min-max neural network with rule extraction
- Authors: Wang, Jin , Lim, Cheepeng , Creighton, Douglas , Khorsavi, Abbas , Nahavandi, Saeid , Ugon, Julien , Vamplew, Peter , Stranieri, Andrew , Martin, Laura , Freischmidt, Anton
- Date: 2015
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 26, no. 2 (2015), p. 277-289
- Full Text: false
- Reviewed:
- Description: A useful patient admission prediction model that helps the emergency department of a hospital admit patients efficiently is of great importance. It not only improves the care quality provided by the emergency department but also reduces waiting time of patients. This paper proposes an automatic prediction method for patient admission based on a fuzzy min–max neural network (FMM) with rules extraction. The FMM neural network forms a set of hyperboxes by learning through data samples, and the learned knowledge is used for prediction. In addition to providing predictions, decision rules are extracted from the FMM hyperboxes to provide an explanation for each prediction. In order to simplify the structure of FMM and the decision rules, an optimization method that simultaneously maximizes prediction accuracy and minimizes the number of FMM hyperboxes is proposed. Specifically, a genetic algorithm is formulated to find the optimal configuration of the decision rules. The experimental results using a large data set consisting of 450740 real patient records reveal that the proposed method achieves comparable or even better prediction accuracy than state-of-the-art classifiers with the additional ability to extract a set of explanatory rules to justify its predictions.
Video driven traffic modelling
- Authors: Zhou, Hailing , Creighton, Douglas , Wei, Lei , Gao, David , Nahavandi, Saeid
- Date: 2013
- Type: Text , Conference paper
- Relation: 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics: Mechatronics for Human Wellbeing, AIM 2013 p. 506-511
- Full Text: false
- Reviewed:
- Description: We propose Video Driven Traffic Modelling (VDTM) for accurate simulation of real-world traffic behaviours with detailed information and low-cost model development and maintenance. Computer vision techniques are employed to estimate traffic parameters. These parameters are used to build and update a traffic system model. The model is simulated using the Paramics traffic simulation platform. Based on the simulation techniques, effects of traffic interventions can be evaluated in order to achieve better decision makings for traffic management authorities. In this paper, traffic parameters such as vehicle types, times of starting trips and corresponding origin-destinations are extracted from a video. A road network is manually defined according to the traffic composition in the video, and individual vehicles associated with extracted properties are modelled and simulated within the defined road network using Paramics. VDTM has widespread potential applications in supporting traffic decision-makings. To demonstrate the effectiveness, we apply it in optimizing a traffic signal control system, which adaptively adjusts green times of signals at an intersection to reduce traffic congestion.
- Description: E1