Applying reinforcement learning in playing Robosoccer using the AIBO
- Authors: Mukherjee, Subhasis
- Date: 2010
- Type: Text , Thesis , Masters
- Full Text:
- Description: "Robosoccer is a popular test bed for AI programs around the world in which AIBO entertainments robots take part in the middle sized soccer event. These robots need a variety of skills to perform in a semi-real environment like this. The three key challenges are manoeuvrability, image recognition and decision making skills. This research is focussed on the decision making skills ... The work focuses on whether reinforcement learning as a form of semi supervised learning can effectively contribute to the goal keeper's decision making when a shot is taken." -
- Description: Master of Computing (by research)
- Authors: Mukherjee, Subhasis
- Date: 2010
- Type: Text , Thesis , Masters
- Full Text:
- Description: "Robosoccer is a popular test bed for AI programs around the world in which AIBO entertainments robots take part in the middle sized soccer event. These robots need a variety of skills to perform in a semi-real environment like this. The three key challenges are manoeuvrability, image recognition and decision making skills. This research is focussed on the decision making skills ... The work focuses on whether reinforcement learning as a form of semi supervised learning can effectively contribute to the goal keeper's decision making when a shot is taken." -
- Description: Master of Computing (by research)
A comparison of machine learning algorithms for multilabel classification of CAN
- Kelarev, Andrei, Stranieri, Andrew, Yearwood, John, Jelinek, Herbert
- Authors: Kelarev, Andrei , Stranieri, Andrew , Yearwood, John , Jelinek, Herbert
- Date: 2012
- Type: Text , Journal article
- Relation: Advances in Computer Science and Engineering Vol. 9, no. 1 (2012), p. 1-4
- Full Text:
- Reviewed:
- Description: This article is devoted to the investigation and comparison of several important machine learning algorithms in their ability to obtain multilabel classifications of the stages of cardiac autonomic neuropathy (CAN). Data was collected by the Diabetes Complications Screening Research Initiative at Charles Sturt University. Our experiments have achieved better results than those published previously in the literature for similar CAN identification tasks.
- Authors: Kelarev, Andrei , Stranieri, Andrew , Yearwood, John , Jelinek, Herbert
- Date: 2012
- Type: Text , Journal article
- Relation: Advances in Computer Science and Engineering Vol. 9, no. 1 (2012), p. 1-4
- Full Text:
- Reviewed:
- Description: This article is devoted to the investigation and comparison of several important machine learning algorithms in their ability to obtain multilabel classifications of the stages of cardiac autonomic neuropathy (CAN). Data was collected by the Diabetes Complications Screening Research Initiative at Charles Sturt University. Our experiments have achieved better results than those published previously in the literature for similar CAN identification tasks.
Multi-source cyber-attacks detection using machine learning
- Taheri, Sona, Gondal, Iqbal, Bagirov, Adil, Harkness, Greg, Brown, Simon, Chi, Chihung
- Authors: Taheri, Sona , Gondal, Iqbal , Bagirov, Adil , Harkness, Greg , Brown, Simon , Chi, Chihung
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne, Australia; 13th-15th February 2019 Vol. 2019-February, p. 1167-1172
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has significantly increased the number of devices connected to the Internet ranging from sensors to multi-source data information. As the IoT continues to evolve with new technologies number of threats and attacks against IoT devices are on the increase. Analyzing and detecting these attacks originating from different sources needs machine learning models. These models provide proactive solutions for detecting attacks and their sources. In this paper, we propose to apply a supervised machine learning classification technique to identify cyber-attacks from each source. More precisely, we apply the incremental piecewise linear classifier that constructs boundary between sources/classes incrementally starting with one hyperplane and adding more hyperplanes at each iteration. The algorithm terminates when no further significant improvement of the separation of sources/classes is possible. The construction and usage of piecewise linear boundaries allows us to avoid any possible overfitting. We apply the incremental piecewise linear classifier on the multi-source real world cyber security data set to identify cyber-attacks and their sources.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
- Authors: Taheri, Sona , Gondal, Iqbal , Bagirov, Adil , Harkness, Greg , Brown, Simon , Chi, Chihung
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne, Australia; 13th-15th February 2019 Vol. 2019-February, p. 1167-1172
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has significantly increased the number of devices connected to the Internet ranging from sensors to multi-source data information. As the IoT continues to evolve with new technologies number of threats and attacks against IoT devices are on the increase. Analyzing and detecting these attacks originating from different sources needs machine learning models. These models provide proactive solutions for detecting attacks and their sources. In this paper, we propose to apply a supervised machine learning classification technique to identify cyber-attacks from each source. More precisely, we apply the incremental piecewise linear classifier that constructs boundary between sources/classes incrementally starting with one hyperplane and adding more hyperplanes at each iteration. The algorithm terminates when no further significant improvement of the separation of sources/classes is possible. The construction and usage of piecewise linear boundaries allows us to avoid any possible overfitting. We apply the incremental piecewise linear classifier on the multi-source real world cyber security data set to identify cyber-attacks and their sources.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
Function similarity using family context
- Black, Paul, Gondal, Iqbal, Vamplew, Peter, Lakhotia, Arun
- Authors: Black, Paul , Gondal, Iqbal , Vamplew, Peter , Lakhotia, Arun
- Date: 2020
- Type: Text , Journal article
- Relation: Electronics Vol. 9, no. 7 (Jul 2020), p. 20
- Full Text:
- Reviewed:
- Description: Finding changed and similar functions between a pair of binaries is an important problem in malware attribution and for the identification of new malware capabilities. This paper presents a new technique called Function Similarity using Family Context (FSFC) for this problem. FSFC trains a Support Vector Machine (SVM) model using pairs of similar functions from two program variants. This method improves upon previous research called Cross Version Contextual Function Similarity (CVCFS) e epresenting a function using features extracted not just from the function itself, but also, from other functions with which it has a caller and callee relationship. We present the results of an initial experiment that shows that the use of additional features from the context of a function significantly decreases the false positive rate, obviating the need for a separate pass for cleaning false positives. The more surprising and unexpected finding is that the SVM model produced by FSFC can abstract function similarity features from one pair of program variants to find similar functions in an unrelated pair of program variants. If validated by a larger study, this new property leads to the possibility of creating generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra.
- Description: This research was performed in the Internet Commerce Security Lab (ICSL), which is a joint venture with research partners Westpac, IBM, and Federation University Australia.
- Authors: Black, Paul , Gondal, Iqbal , Vamplew, Peter , Lakhotia, Arun
- Date: 2020
- Type: Text , Journal article
- Relation: Electronics Vol. 9, no. 7 (Jul 2020), p. 20
- Full Text:
- Reviewed:
- Description: Finding changed and similar functions between a pair of binaries is an important problem in malware attribution and for the identification of new malware capabilities. This paper presents a new technique called Function Similarity using Family Context (FSFC) for this problem. FSFC trains a Support Vector Machine (SVM) model using pairs of similar functions from two program variants. This method improves upon previous research called Cross Version Contextual Function Similarity (CVCFS) e epresenting a function using features extracted not just from the function itself, but also, from other functions with which it has a caller and callee relationship. We present the results of an initial experiment that shows that the use of additional features from the context of a function significantly decreases the false positive rate, obviating the need for a separate pass for cleaning false positives. The more surprising and unexpected finding is that the SVM model produced by FSFC can abstract function similarity features from one pair of program variants to find similar functions in an unrelated pair of program variants. If validated by a larger study, this new property leads to the possibility of creating generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra.
- Description: This research was performed in the Internet Commerce Security Lab (ICSL), which is a joint venture with research partners Westpac, IBM, and Federation University Australia.
An application of high-dimensional statistics to predictive modeling of grade variability
- Hinz, Juri, Grigoryev, Igor, Novikov, Alexander
- Authors: Hinz, Juri , Grigoryev, Igor , Novikov, Alexander
- Date: 2020
- Type: Text , Journal article
- Relation: Geosciences (Switzerland) Vol. 10, no. 4 (2020), p.
- Full Text:
- Reviewed:
- Description: The economic viability of a mining project depends on its efficient exploration, which requires a prediction of worthwhile ore in a mine deposit. In this work, we apply the so-called LASSO methodology to estimate mineral concentration within unexplored areas. Our methodology outperforms traditional techniques not only in terms of logical consistency, but potentially also in costs reduction. Our approach is illustrated by a full source code listing and a detailed discussion of the advantages and limitations of our approach. © 2020 by the authors. Licensee MDPI, Basel, Switzerland.
- Authors: Hinz, Juri , Grigoryev, Igor , Novikov, Alexander
- Date: 2020
- Type: Text , Journal article
- Relation: Geosciences (Switzerland) Vol. 10, no. 4 (2020), p.
- Full Text:
- Reviewed:
- Description: The economic viability of a mining project depends on its efficient exploration, which requires a prediction of worthwhile ore in a mine deposit. In this work, we apply the so-called LASSO methodology to estimate mineral concentration within unexplored areas. Our methodology outperforms traditional techniques not only in terms of logical consistency, but potentially also in costs reduction. Our approach is illustrated by a full source code listing and a detailed discussion of the advantages and limitations of our approach. © 2020 by the authors. Licensee MDPI, Basel, Switzerland.
Hybrid intrusion detection system based on the stacking ensemble of C5 decision tree classifier and one class support vector machine
- Khraisat, Ansam, Gondal, Iqbal, Vamplew, Peter, Kamruzzaman, Joarder, Alazab, Ammar
- Authors: Khraisat, Ansam , Gondal, Iqbal , Vamplew, Peter , Kamruzzaman, Joarder , Alazab, Ammar
- Date: 2020
- Type: Text , Journal article
- Relation: Electronics (Switzerland) Vol. 9, no. 1 (2020), p.
- Full Text:
- Reviewed:
- Description: Cyberttacks are becoming increasingly sophisticated, necessitating the efficient intrusion detection mechanisms to monitor computer resources and generate reports on anomalous or suspicious activities. Many Intrusion Detection Systems (IDSs) use a single classifier for identifying intrusions. Single classifier IDSs are unable to achieve high accuracy and low false alarm rates due to polymorphic, metamorphic, and zero-day behaviors of malware. In this paper, a Hybrid IDS (HIDS) is proposed by combining the C5 decision tree classifier and One Class Support Vector Machine (OC-SVM). HIDS combines the strengths of SIDS) and Anomaly-based Intrusion Detection System (AIDS). The SIDS was developed based on the C5.0 Decision tree classifier and AIDS was developed based on the one-class Support Vector Machine (SVM). This framework aims to identify both the well-known intrusions and zero-day attacks with high detection accuracy and low false-alarm rates. The proposed HIDS is evaluated using the benchmark datasets, namely, Network Security Laboratory-Knowledge Discovery in Databases (NSL-KDD) and Australian Defence Force Academy (ADFA) datasets. Studies show that the performance of HIDS is enhanced, compared to SIDS and AIDS in terms of detection rate and low false-alarm rates. © 2020 by the authors. Licensee MDPI, Basel, Switzerland.
- Authors: Khraisat, Ansam , Gondal, Iqbal , Vamplew, Peter , Kamruzzaman, Joarder , Alazab, Ammar
- Date: 2020
- Type: Text , Journal article
- Relation: Electronics (Switzerland) Vol. 9, no. 1 (2020), p.
- Full Text:
- Reviewed:
- Description: Cyberttacks are becoming increasingly sophisticated, necessitating the efficient intrusion detection mechanisms to monitor computer resources and generate reports on anomalous or suspicious activities. Many Intrusion Detection Systems (IDSs) use a single classifier for identifying intrusions. Single classifier IDSs are unable to achieve high accuracy and low false alarm rates due to polymorphic, metamorphic, and zero-day behaviors of malware. In this paper, a Hybrid IDS (HIDS) is proposed by combining the C5 decision tree classifier and One Class Support Vector Machine (OC-SVM). HIDS combines the strengths of SIDS) and Anomaly-based Intrusion Detection System (AIDS). The SIDS was developed based on the C5.0 Decision tree classifier and AIDS was developed based on the one-class Support Vector Machine (SVM). This framework aims to identify both the well-known intrusions and zero-day attacks with high detection accuracy and low false-alarm rates. The proposed HIDS is evaluated using the benchmark datasets, namely, Network Security Laboratory-Knowledge Discovery in Databases (NSL-KDD) and Australian Defence Force Academy (ADFA) datasets. Studies show that the performance of HIDS is enhanced, compared to SIDS and AIDS in terms of detection rate and low false-alarm rates. © 2020 by the authors. Licensee MDPI, Basel, Switzerland.
Survey of intrusion detection systems : techniques, datasets and challenges
- Khraisat, Ansam, Iqbal, Gondal, Vamplew, Peter, Kamruzzaman, Joarder
- Authors: Khraisat, Ansam , Iqbal, Gondal , Vamplew, Peter , Kamruzzaman, Joarder
- Date: 2019
- Type: Text , Journal article
- Relation: Cybersecurity Vol. 2 , no. 1 (2019), p. 1-22
- Full Text:
- Reviewed:
- Authors: Khraisat, Ansam , Iqbal, Gondal , Vamplew, Peter , Kamruzzaman, Joarder
- Date: 2019
- Type: Text , Journal article
- Relation: Cybersecurity Vol. 2 , no. 1 (2019), p. 1-22
- Full Text:
- Reviewed:
Rapid health data repository allocation using predictive machine learning
- Uddin, Ashraf, Stranieri, Andrew, Gondal, Iqbal, Balasubramanian, Venki
- Authors: Uddin, Ashraf , Stranieri, Andrew , Gondal, Iqbal , Balasubramanian, Venki
- Date: 2020
- Type: Text , Journal article
- Relation: Health Informatics Journal Vol. 26, no. 4 (2020), p. 3009-3036
- Full Text:
- Reviewed:
- Description: Health-related data is stored in a number of repositories that are managed and controlled by different entities. For instance, Electronic Health Records are usually administered by governments. Electronic Medical Records are typically controlled by health care providers, whereas Personal Health Records are managed directly by patients. Recently, Blockchain-based health record systems largely regulated by technology have emerged as another type of repository. Repositories for storing health data differ from one another based on cost, level of security and quality of performance. Not only has the type of repositories increased in recent years, but the quantum of health data to be stored has increased. For instance, the advent of wearable sensors that capture physiological signs has resulted in an exponential growth in digital health data. The increase in the types of repository and amount of data has driven a need for intelligent processes to select appropriate repositories as data is collected. However, the storage allocation decision is complex and nuanced. The challenges are exacerbated when health data are continuously streamed, as is the case with wearable sensors. Although patients are not always solely responsible for determining which repository should be used, they typically have some input into this decision. Patients can be expected to have idiosyncratic preferences regarding storage decisions depending on their unique contexts. In this paper, we propose a predictive model for the storage of health data that can meet patient needs and make storage decisions rapidly, in real-time, even with data streaming from wearable sensors. The model is built with a machine learning classifier that learns the mapping between characteristics of health data and features of storage repositories from a training set generated synthetically from correlations evident from small samples of experts. Results from the evaluation demonstrate the viability of the machine learning technique used. © The Author(s) 2020.
- Authors: Uddin, Ashraf , Stranieri, Andrew , Gondal, Iqbal , Balasubramanian, Venki
- Date: 2020
- Type: Text , Journal article
- Relation: Health Informatics Journal Vol. 26, no. 4 (2020), p. 3009-3036
- Full Text:
- Reviewed:
- Description: Health-related data is stored in a number of repositories that are managed and controlled by different entities. For instance, Electronic Health Records are usually administered by governments. Electronic Medical Records are typically controlled by health care providers, whereas Personal Health Records are managed directly by patients. Recently, Blockchain-based health record systems largely regulated by technology have emerged as another type of repository. Repositories for storing health data differ from one another based on cost, level of security and quality of performance. Not only has the type of repositories increased in recent years, but the quantum of health data to be stored has increased. For instance, the advent of wearable sensors that capture physiological signs has resulted in an exponential growth in digital health data. The increase in the types of repository and amount of data has driven a need for intelligent processes to select appropriate repositories as data is collected. However, the storage allocation decision is complex and nuanced. The challenges are exacerbated when health data are continuously streamed, as is the case with wearable sensors. Although patients are not always solely responsible for determining which repository should be used, they typically have some input into this decision. Patients can be expected to have idiosyncratic preferences regarding storage decisions depending on their unique contexts. In this paper, we propose a predictive model for the storage of health data that can meet patient needs and make storage decisions rapidly, in real-time, even with data streaming from wearable sensors. The model is built with a machine learning classifier that learns the mapping between characteristics of health data and features of storage repositories from a training set generated synthetically from correlations evident from small samples of experts. Results from the evaluation demonstrate the viability of the machine learning technique used. © The Author(s) 2020.
Biopsychosocial Data Analytics and Modeling
- Authors: Santhanagopalan, Meena
- Date: 2021
- Type: Text , Thesis , PhD
- Full Text:
- Description: Sustained customisation of digital health intervention (DHI) programs, in the context of community health engagement, requires strong integration of multi-sourced interdisciplinary biopsychosocial health data. The biopsychosocial model is built upon the idea that biological, psychological and social processes are integrally and interactively involved in physical health and illness. One of the longstanding challenges of dealing with healthcare data is the wide variety of data generated from different sources and the increasing need to learn actionable insights that drive performance improvement. The growth of information and communication technology has led to the increased use of DHI programs. These programs use an observational methodology that helps researchers to study the everyday behaviour of participants during the course of the program by analysing data generated from digital tools such as wearables, online surveys and ecological momentary assessment (EMA). Combined with data reported from biological and psychological tests, this provides rich and unique biopsychosocial data. There is a strong need to review and apply novel approaches to combining biopsychosocial data from a methodological perspective. Although some studies have used data analytics in research on clinical trial data generated from digital interventions, data analytics on biopsychosocial data generated from DHI programs is limited. The study in this thesis develops and implements innovative approaches for analysing the existing unique and rich biopsychosocial data generated from the wellness study, a DHI program conducted by the School of Science, Psychology and Sport at Federation University. The characteristics of variety, value and veracity that usually describe big data are also relevant to the biopsychosocial data handled in this thesis. These historical, retrospective real-life biopsychosocial data provide fertile ground for research through the use of data analytics to discover patterns hidden in the data and to obtain new knowledge. This thesis presents the studies carried out on three aspects of biopsychosocial research. First, we present the salient traits of the three components - biological, psychological and social - of biopsychosocial research. Next, we investigate the challenges of pre-processing biopsychosocial data, placing special emphasis on the time-series data generated from wearable sensor devices. Finally, we present the application of statistical and machine learning (ML) tools to integrate variables from the biopsychosocial disciplines to build a predictive model. The first chapter presents the salient features of the biopsychosocial data for each discipline. The second chapter presents the challenges of pre-processing biopsychosocial data, focusing on the time-series data generated from wearable sensor devices. The third chapter uses statistical and ML tools to integrate variables from the biopsychosocial disciplines to build a predictive model. Among its other important analyses and results, the key contributions of the research described in this thesis include the following: 1. using gamma distribution to model neurocognitive reaction time data that presents interesting properties (skewness and kurtosis for the data distribution) 2. using novel ‘peak heart-rate’ count metric to quantify ‘biological’ stress 3. using the ML approach to evaluate DHIs 4. using a recurrent neural network (RNN) and long short-term memory (LSTM) data prediction model to predict Difficulties in Emotion Regulation Scale (DERS) and primary emotion (PE) using wearable sensor data.
- Description: Doctor of Philosophy
- Authors: Santhanagopalan, Meena
- Date: 2021
- Type: Text , Thesis , PhD
- Full Text:
- Description: Sustained customisation of digital health intervention (DHI) programs, in the context of community health engagement, requires strong integration of multi-sourced interdisciplinary biopsychosocial health data. The biopsychosocial model is built upon the idea that biological, psychological and social processes are integrally and interactively involved in physical health and illness. One of the longstanding challenges of dealing with healthcare data is the wide variety of data generated from different sources and the increasing need to learn actionable insights that drive performance improvement. The growth of information and communication technology has led to the increased use of DHI programs. These programs use an observational methodology that helps researchers to study the everyday behaviour of participants during the course of the program by analysing data generated from digital tools such as wearables, online surveys and ecological momentary assessment (EMA). Combined with data reported from biological and psychological tests, this provides rich and unique biopsychosocial data. There is a strong need to review and apply novel approaches to combining biopsychosocial data from a methodological perspective. Although some studies have used data analytics in research on clinical trial data generated from digital interventions, data analytics on biopsychosocial data generated from DHI programs is limited. The study in this thesis develops and implements innovative approaches for analysing the existing unique and rich biopsychosocial data generated from the wellness study, a DHI program conducted by the School of Science, Psychology and Sport at Federation University. The characteristics of variety, value and veracity that usually describe big data are also relevant to the biopsychosocial data handled in this thesis. These historical, retrospective real-life biopsychosocial data provide fertile ground for research through the use of data analytics to discover patterns hidden in the data and to obtain new knowledge. This thesis presents the studies carried out on three aspects of biopsychosocial research. First, we present the salient traits of the three components - biological, psychological and social - of biopsychosocial research. Next, we investigate the challenges of pre-processing biopsychosocial data, placing special emphasis on the time-series data generated from wearable sensor devices. Finally, we present the application of statistical and machine learning (ML) tools to integrate variables from the biopsychosocial disciplines to build a predictive model. The first chapter presents the salient features of the biopsychosocial data for each discipline. The second chapter presents the challenges of pre-processing biopsychosocial data, focusing on the time-series data generated from wearable sensor devices. The third chapter uses statistical and ML tools to integrate variables from the biopsychosocial disciplines to build a predictive model. Among its other important analyses and results, the key contributions of the research described in this thesis include the following: 1. using gamma distribution to model neurocognitive reaction time data that presents interesting properties (skewness and kurtosis for the data distribution) 2. using novel ‘peak heart-rate’ count metric to quantify ‘biological’ stress 3. using the ML approach to evaluate DHIs 4. using a recurrent neural network (RNN) and long short-term memory (LSTM) data prediction model to predict Difficulties in Emotion Regulation Scale (DERS) and primary emotion (PE) using wearable sensor data.
- Description: Doctor of Philosophy
A critical review of intrusion detection systems in the internet of things : techniques, deployment strategy, validation strategy, attacks, public datasets and challenges
- Khraisat, Ansam, Alazab, Ammar
- Authors: Khraisat, Ansam , Alazab, Ammar
- Date: 2021
- Type: Text , Journal article
- Relation: Cybersecurity Vol. 4, no. 1 (2021), p.
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has been rapidly evolving towards making a greater impact on everyday life to large industrial systems. Unfortunately, this has attracted the attention of cybercriminals who made IoT a target of malicious activities, opening the door to a possible attack on the end nodes. To this end, Numerous IoT intrusion detection Systems (IDS) have been proposed in the literature to tackle attacks on the IoT ecosystem, which can be broadly classified based on detection technique, validation strategy, and deployment strategy. This survey paper presents a comprehensive review of contemporary IoT IDS and an overview of techniques, deployment Strategy, validation strategy and datasets that are commonly applied for building IDS. We also review how existing IoT IDS detect intrusive attacks and secure communications on the IoT. It also presents the classification of IoT attacks and discusses future research challenges to counter such IoT attacks to make IoT more secure. These purposes help IoT security researchers by uniting, contrasting, and compiling scattered research efforts. Consequently, we provide a unique IoT IDS taxonomy, which sheds light on IoT IDS techniques, their advantages and disadvantages, IoT attacks that exploit IoT communication systems, corresponding advanced IDS and detection capabilities to detect IoT attacks. © 2021, The Author(s).
- Authors: Khraisat, Ansam , Alazab, Ammar
- Date: 2021
- Type: Text , Journal article
- Relation: Cybersecurity Vol. 4, no. 1 (2021), p.
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has been rapidly evolving towards making a greater impact on everyday life to large industrial systems. Unfortunately, this has attracted the attention of cybercriminals who made IoT a target of malicious activities, opening the door to a possible attack on the end nodes. To this end, Numerous IoT intrusion detection Systems (IDS) have been proposed in the literature to tackle attacks on the IoT ecosystem, which can be broadly classified based on detection technique, validation strategy, and deployment strategy. This survey paper presents a comprehensive review of contemporary IoT IDS and an overview of techniques, deployment Strategy, validation strategy and datasets that are commonly applied for building IDS. We also review how existing IoT IDS detect intrusive attacks and secure communications on the IoT. It also presents the classification of IoT attacks and discusses future research challenges to counter such IoT attacks to make IoT more secure. These purposes help IoT security researchers by uniting, contrasting, and compiling scattered research efforts. Consequently, we provide a unique IoT IDS taxonomy, which sheds light on IoT IDS techniques, their advantages and disadvantages, IoT attacks that exploit IoT communication systems, corresponding advanced IDS and detection capabilities to detect IoT attacks. © 2021, The Author(s).
The spectrum of big data analytics
- Authors: Sun, Zhaohao , Huo, Yanxia
- Date: 2021
- Type: Text , Journal article
- Relation: Journal of Computer Information Systems Vol. 61, no. 2 (2021), p. 154-162
- Full Text:
- Reviewed:
- Description: Big data analytics is playing a pivotal role in big data, artificial intelligence, management, governance, and society with the dramatic development of big data, analytics, artificial intelligence. However, what is the spectrum of big data analytics and how to develop the spectrum are still a fundamental issue in the academic community. This article addresses these issues by presenting a big data derived small data approach. It then uses the proposed approach to analyze the top 150 profiles of Google Scholar, including big data analytics as one research field and proposes a spectrum of big data analytics. The spectrum of big data analytics mainly includes data mining, machine learning, data science and systems, artificial intelligence, distributed computing and systems, and cloud computing, taking into account degree of importance. The proposed approach and findings will generalize to other researchers and practitioners of big data analytics, machine learning, artificial intelligence, and data science. © 2019 International Association for Computer Information Systems.
- Authors: Sun, Zhaohao , Huo, Yanxia
- Date: 2021
- Type: Text , Journal article
- Relation: Journal of Computer Information Systems Vol. 61, no. 2 (2021), p. 154-162
- Full Text:
- Reviewed:
- Description: Big data analytics is playing a pivotal role in big data, artificial intelligence, management, governance, and society with the dramatic development of big data, analytics, artificial intelligence. However, what is the spectrum of big data analytics and how to develop the spectrum are still a fundamental issue in the academic community. This article addresses these issues by presenting a big data derived small data approach. It then uses the proposed approach to analyze the top 150 profiles of Google Scholar, including big data analytics as one research field and proposes a spectrum of big data analytics. The spectrum of big data analytics mainly includes data mining, machine learning, data science and systems, artificial intelligence, distributed computing and systems, and cloud computing, taking into account degree of importance. The proposed approach and findings will generalize to other researchers and practitioners of big data analytics, machine learning, artificial intelligence, and data science. © 2019 International Association for Computer Information Systems.
A novel OFDM format and a machine learning based dimming control for lifi
- Nowrin, Itisha, Mondal, M., Islam, Rashed, Kamruzzaman, Joarder
- Authors: Nowrin, Itisha , Mondal, M. , Islam, Rashed , Kamruzzaman, Joarder
- Date: 2021
- Type: Text , Journal article
- Relation: Electronics (Switzerland) Vol. 10, no. 17 (2021), p.
- Full Text:
- Reviewed:
- Description: This paper proposes a new hybrid orthogonal frequency division multiplexing (OFDM) form termed as DC‐biased pulse amplitude modulated optical OFDM (DPO‐OFDM) by combining the ideas of the existing DC‐biased optical OFDM (DCO‐OFDM) and pulse amplitude modulated discrete multitone (PAM‐DMT). The analysis indicates that the required DC‐bias for DPO‐OFDM-based light fidelity (LiFi) depends on the dimming level and the components of the DPO‐OFDM. The bit error rate (BER) performance and dimming flexibility of the DPO‐OFDM and existing OFDM schemes are evaluated using MATLAB tools. The results show that the proposed DPO‐OFDM is power efficient and has a wide dimming range. Furthermore, a switching algorithm is introduced for LiFi, where the individual components of the hybrid OFDM are switched according to a target dimming level. Next, machine learning algorithms are used for the first time to find the appropriate proportions of the hybrid OFDM components. It is shown that polynomial regression of degree 4 can reliably predict the constellation size of the DCO‐OFDM component of DPO‐OFDM for a given constellation size of PAM‐DMT. With the component switching and the machine learning algorithms, DPO‐OFDM‐based LiFi is power efficient at a wide dimming range. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
- Authors: Nowrin, Itisha , Mondal, M. , Islam, Rashed , Kamruzzaman, Joarder
- Date: 2021
- Type: Text , Journal article
- Relation: Electronics (Switzerland) Vol. 10, no. 17 (2021), p.
- Full Text:
- Reviewed:
- Description: This paper proposes a new hybrid orthogonal frequency division multiplexing (OFDM) form termed as DC‐biased pulse amplitude modulated optical OFDM (DPO‐OFDM) by combining the ideas of the existing DC‐biased optical OFDM (DCO‐OFDM) and pulse amplitude modulated discrete multitone (PAM‐DMT). The analysis indicates that the required DC‐bias for DPO‐OFDM-based light fidelity (LiFi) depends on the dimming level and the components of the DPO‐OFDM. The bit error rate (BER) performance and dimming flexibility of the DPO‐OFDM and existing OFDM schemes are evaluated using MATLAB tools. The results show that the proposed DPO‐OFDM is power efficient and has a wide dimming range. Furthermore, a switching algorithm is introduced for LiFi, where the individual components of the hybrid OFDM are switched according to a target dimming level. Next, machine learning algorithms are used for the first time to find the appropriate proportions of the hybrid OFDM components. It is shown that polynomial regression of degree 4 can reliably predict the constellation size of the DCO‐OFDM component of DPO‐OFDM for a given constellation size of PAM‐DMT. With the component switching and the machine learning algorithms, DPO‐OFDM‐based LiFi is power efficient at a wide dimming range. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
Imbalanced data classification and its application in cyber security
- Authors: Moniruzzaman, Md
- Date: 2020
- Type: Text , Thesis , PhD
- Full Text:
- Description: Cyber security, also known as information technology security or simply as information security, aims to protect government organizations, companies and individuals by defending their computers, servers, electronic systems, networks, and data from malicious attacks. With the advancement of client-side on the fly web content generation techniques, it becomes easier for attackers to modify the content of a website dynamically and gain access to valuable information. The impact of cybercrime to the global economy is now more than ever, and it is growing day by day. Among various types of cybercrimes, financial attacks are widely spread and the financial sector is among most targeted. Both corporations and individuals are losing a huge amount of money each year. The majority portion of financial attacks is carried out by banking malware and web-based attacks. The end users are not always skilled enough to differentiate between injected content and actual contents of a webpage. Designing a real-time security system for ensuring a safe browsing experience is a challenging task. Some of the existing solutions are designed for client side and all the users have to install it in their system, which is very difficult to implement. In addition, various platforms and tools are used by organizations and individuals, therefore, different solutions are needed to be designed. The existing server-side solution often focuses on sanitizing and filtering the inputs. It will fail to detect obfuscated and hidden scripts. This is a realtime security system and any significant delay will hamper user experience. Therefore, finding the most optimized and efficient solution is very important. To ensure an easy installation and integration capabilities of any solution with the existing system is also a critical factor to consider. If the solution is efficient but difficult to integrate, then it may not be a feasible solution for practical use. Unsupervised and supervised data classification techniques have been widely applied to design algorithms for solving cyber security problems. The performance of these algorithms varies depending on types of cyber security problems and size of datasets. To date, existing algorithms do not achieve high accuracy in detecting malware activities. Datasets in cyber security and, especially those from financial sectors, are predominantly imbalanced datasets as the number of malware activities is significantly less than the number of normal activities. This means that classifiers for imbalanced datasets can be used to develop supervised data classification algorithms to detect malware activities. Development of classifiers for imbalanced data sets has been subject of research over the last decade. Most of these classifiers are based on oversampling and undersampling techniques and are not efficient in many situations as such techniques are applied globally. In this thesis, we develop two new algorithms for solving supervised data classification problems in imbalanced datasets and then apply them to solve malware detection problems. The first algorithm is designed using the piecewise linear classifiers by formulating this problem as an optimization problem and by applying the penalty function method. More specifically, we add more penalty to the objective function for misclassified points from minority classes. The second method is based on the combination of the supervised and unsupervised (clustering) algorithms. Such an approach allows one to identify areas in the input space where minority classes are located and to apply local oversampling or undersampling. This approach leads to the design of more efficient and accurate classifiers. The proposed algorithms are tested using real-world datasets. Results clearly demonstrate superiority of newly introduced algorithms. Then we apply these algorithms to design classifiers to detect malwares.
- Description: Doctor of Philosophy
- Authors: Moniruzzaman, Md
- Date: 2020
- Type: Text , Thesis , PhD
- Full Text:
- Description: Cyber security, also known as information technology security or simply as information security, aims to protect government organizations, companies and individuals by defending their computers, servers, electronic systems, networks, and data from malicious attacks. With the advancement of client-side on the fly web content generation techniques, it becomes easier for attackers to modify the content of a website dynamically and gain access to valuable information. The impact of cybercrime to the global economy is now more than ever, and it is growing day by day. Among various types of cybercrimes, financial attacks are widely spread and the financial sector is among most targeted. Both corporations and individuals are losing a huge amount of money each year. The majority portion of financial attacks is carried out by banking malware and web-based attacks. The end users are not always skilled enough to differentiate between injected content and actual contents of a webpage. Designing a real-time security system for ensuring a safe browsing experience is a challenging task. Some of the existing solutions are designed for client side and all the users have to install it in their system, which is very difficult to implement. In addition, various platforms and tools are used by organizations and individuals, therefore, different solutions are needed to be designed. The existing server-side solution often focuses on sanitizing and filtering the inputs. It will fail to detect obfuscated and hidden scripts. This is a realtime security system and any significant delay will hamper user experience. Therefore, finding the most optimized and efficient solution is very important. To ensure an easy installation and integration capabilities of any solution with the existing system is also a critical factor to consider. If the solution is efficient but difficult to integrate, then it may not be a feasible solution for practical use. Unsupervised and supervised data classification techniques have been widely applied to design algorithms for solving cyber security problems. The performance of these algorithms varies depending on types of cyber security problems and size of datasets. To date, existing algorithms do not achieve high accuracy in detecting malware activities. Datasets in cyber security and, especially those from financial sectors, are predominantly imbalanced datasets as the number of malware activities is significantly less than the number of normal activities. This means that classifiers for imbalanced datasets can be used to develop supervised data classification algorithms to detect malware activities. Development of classifiers for imbalanced data sets has been subject of research over the last decade. Most of these classifiers are based on oversampling and undersampling techniques and are not efficient in many situations as such techniques are applied globally. In this thesis, we develop two new algorithms for solving supervised data classification problems in imbalanced datasets and then apply them to solve malware detection problems. The first algorithm is designed using the piecewise linear classifiers by formulating this problem as an optimization problem and by applying the penalty function method. More specifically, we add more penalty to the objective function for misclassified points from minority classes. The second method is based on the combination of the supervised and unsupervised (clustering) algorithms. Such an approach allows one to identify areas in the input space where minority classes are located and to apply local oversampling or undersampling. This approach leads to the design of more efficient and accurate classifiers. The proposed algorithms are tested using real-world datasets. Results clearly demonstrate superiority of newly introduced algorithms. Then we apply these algorithms to design classifiers to detect malwares.
- Description: Doctor of Philosophy
Stability prediction of Himalayan residual soil slope using artificial neural network
- Ray, Arunava, Kumar, Vikash, Kumar, Amit, Rai, Rajesh, Khandelwal, Manoj, Singh, T.
- Authors: Ray, Arunava , Kumar, Vikash , Kumar, Amit , Rai, Rajesh , Khandelwal, Manoj , Singh, T.
- Date: 2020
- Type: Text , Journal article
- Relation: Natural Hazards Vol. 103, no. 3 (2020), p. 3523-3540
- Full Text:
- Reviewed:
- Description: In the past decade, advances in machine learning (ML) techniques have resulted in developing sophisticated models that are capable of modelling extremely complex multi-factorial problems like slope stability analysis. The literature review indicates that considerable works have been done in slope stability using ML, but none of them covers the analysis of residual soil slope. The present study aims to develop an artificial neural network (ANN) model that can be employed for evaluating the factor of safety of Shiwalik Slopes in the Himalayan Region. Data obtained from numerical analysis of a residual soil slope were used to develop two ANN models (ANN1 and ANN2 utilising eleven input parameters, and scaled-down number of parameters based on correlation coefficient, respectively). A four-layer, feed-forward back-propagation neural network having the optimum number of hidden neurons is developed based on trial-and-error method. The results derived from ANN models were compared with those achieved from numerical analysis. Additionally, several performance indices such as coefficient of determination (R2), root mean square error, variance account for, and residual error were employed to evaluate the predictive performance of the developed ANN models. Both the ANN models have shown good prediction performance; however, the overall performance of the ANN2 model is better than the ANN1 model. It is concluded that the ANN models are reliable, valid, and straightforward computational tools that can be employed for slope stability analysis during the preliminary stage of designing infrastructure projects in residual soil slope. © 2020, Springer Nature B.V.
- Authors: Ray, Arunava , Kumar, Vikash , Kumar, Amit , Rai, Rajesh , Khandelwal, Manoj , Singh, T.
- Date: 2020
- Type: Text , Journal article
- Relation: Natural Hazards Vol. 103, no. 3 (2020), p. 3523-3540
- Full Text:
- Reviewed:
- Description: In the past decade, advances in machine learning (ML) techniques have resulted in developing sophisticated models that are capable of modelling extremely complex multi-factorial problems like slope stability analysis. The literature review indicates that considerable works have been done in slope stability using ML, but none of them covers the analysis of residual soil slope. The present study aims to develop an artificial neural network (ANN) model that can be employed for evaluating the factor of safety of Shiwalik Slopes in the Himalayan Region. Data obtained from numerical analysis of a residual soil slope were used to develop two ANN models (ANN1 and ANN2 utilising eleven input parameters, and scaled-down number of parameters based on correlation coefficient, respectively). A four-layer, feed-forward back-propagation neural network having the optimum number of hidden neurons is developed based on trial-and-error method. The results derived from ANN models were compared with those achieved from numerical analysis. Additionally, several performance indices such as coefficient of determination (R2), root mean square error, variance account for, and residual error were employed to evaluate the predictive performance of the developed ANN models. Both the ANN models have shown good prediction performance; however, the overall performance of the ANN2 model is better than the ANN1 model. It is concluded that the ANN models are reliable, valid, and straightforward computational tools that can be employed for slope stability analysis during the preliminary stage of designing infrastructure projects in residual soil slope. © 2020, Springer Nature B.V.
Automated health condition diagnosis of in situ wood utility poles using an intelligent non-destructive evaluation (NDE) framework
- Yu, Yang, Subhani, Mahbube, Hoshyar, Azadeh, Li, Jianchun, Li, Huan
- Authors: Yu, Yang , Subhani, Mahbube , Hoshyar, Azadeh , Li, Jianchun , Li, Huan
- Date: 2020
- Type: Text , Journal article
- Relation: International Journal of Structural Stability and Dynamics Vol. 20, no. 10 (2020), p.
- Full Text:
- Reviewed:
- Description: Wood utility poles are widely applied in power transmission and telecommunication systems in Australia. Because of a variety of external influence factors, such as fungi, termite and environmental conditions, failure of poles due to the wood degradation with time is of common occurrence with high degree uncertainty. The pole failure may result in serious consequences including both economic and public safety. Therefore, accurately and timely identifying the health condition of the utility poles is of great significance for economic and safe operation of electricity and communication networks. In this paper, a novel non-destructive evaluation (NDE) framework with advanced signal processing and artificial intelligence (AI) techniques is developed to diagnose the condition of utility pole in field. To begin with, the guided waves (GWs) generated within the pole is measured using multi-sensing technique, avoiding difficult interpretation of various wave modes which cannot be detected by only one sensor. Then, empirical mode decomposition (EMD) and principal component analysis (PCA) are employed to extract and select damage-sensitive features from the captured GW signals. Additionally, the up-to-date machine learning (ML) techniques are adopted to diagnose the health condition of the pole based on selected signal patterns. Eventually, the performance of the developed NDE framework is evaluated using the field testing data from 15 new and 24 decommissioned utility poles at the pole yard in Sydney. © 2020 World Scientific Publishing Company.
- Description: This research is supported by Australian Research Council via Linkage Project (LP110200162) and Industrial Transforming Research Hub for Nanoscience Based Construction Materials Manufacturing (IH150100006) as well as Ausgrid. The authors greatly appreciate the ¯nancial and technical supports from the funding bodies.
- Authors: Yu, Yang , Subhani, Mahbube , Hoshyar, Azadeh , Li, Jianchun , Li, Huan
- Date: 2020
- Type: Text , Journal article
- Relation: International Journal of Structural Stability and Dynamics Vol. 20, no. 10 (2020), p.
- Full Text:
- Reviewed:
- Description: Wood utility poles are widely applied in power transmission and telecommunication systems in Australia. Because of a variety of external influence factors, such as fungi, termite and environmental conditions, failure of poles due to the wood degradation with time is of common occurrence with high degree uncertainty. The pole failure may result in serious consequences including both economic and public safety. Therefore, accurately and timely identifying the health condition of the utility poles is of great significance for economic and safe operation of electricity and communication networks. In this paper, a novel non-destructive evaluation (NDE) framework with advanced signal processing and artificial intelligence (AI) techniques is developed to diagnose the condition of utility pole in field. To begin with, the guided waves (GWs) generated within the pole is measured using multi-sensing technique, avoiding difficult interpretation of various wave modes which cannot be detected by only one sensor. Then, empirical mode decomposition (EMD) and principal component analysis (PCA) are employed to extract and select damage-sensitive features from the captured GW signals. Additionally, the up-to-date machine learning (ML) techniques are adopted to diagnose the health condition of the pole based on selected signal patterns. Eventually, the performance of the developed NDE framework is evaluated using the field testing data from 15 new and 24 decommissioned utility poles at the pole yard in Sydney. © 2020 World Scientific Publishing Company.
- Description: This research is supported by Australian Research Council via Linkage Project (LP110200162) and Industrial Transforming Research Hub for Nanoscience Based Construction Materials Manufacturing (IH150100006) as well as Ausgrid. The authors greatly appreciate the ¯nancial and technical supports from the funding bodies.
Techniques for the reverse engineering of banking malware
- Authors: Black, Paul
- Date: 2020
- Type: Text , Thesis , PhD
- Full Text:
- Description: Malware attacks are a significant and frequently reported problem, adversely affecting the productivity of organisations and governments worldwide. The well-documented consequences of malware attacks include financial loss, data loss, reputation damage, infrastructure damage, theft of intellectual property, compromise of commercial negotiations, and national security risks. Mitiga-tion activities involve a significant amount of manual analysis. Therefore, there is a need for automated techniques for malware analysis to identify malicious behaviours. Research into automated techniques for malware analysis covers a wide range of activities. This thesis consists of a series of studies: an anal-ysis of banking malware families and their common behaviours, an emulated command and control environment for dynamic malware analysis, a technique to identify similar malware functions, and a technique for the detection of ransomware. An analysis of the nature of banking malware, its major malware families, behaviours, variants, and inter-relationships are provided in this thesis. In doing this, this research takes a broad view of malware analysis, starting with the implementation of the malicious behaviours through to detailed analysis using machine learning. The broad approach taken in this thesis differs from some other studies that approach malware research in a more abstract sense. A disadvantage of approaching malware research without domain knowledge, is that important methodology questions may not be considered. Large datasets of historical malware samples are available for countermea-sures research. However, due to the age of these samples, the original malware infrastructure is no longer available, often restricting malware operations to initialisation functions only. To address this absence, an emulated command and control environment is provided. This emulated environment provides full control of the malware, enabling the capabilities of the original in-the-wild operation, while enabling feature extraction for research purposes. A major focus of this thesis has been the development of a machine learn-ing function similarity method with a novel feature encoding that increases feature strength. This research develops techniques to demonstrate that the machine learning model trained on similarity features from one program can find similar functions in another, unrelated program. This finding can lead to the development of generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra. Further, this research examines the use of API call features for the identi-fication of ransomware and shows that a failure to consider malware analysis domain knowledge can lead to weaknesses in experimental design. In this case, we show that existing research has difficulty in discriminating between ransomware and benign cryptographic software. This thesis by publication, has developed techniques to advance the disci-pline of malware reverse engineering, in order to minimize harm due to cyber-attacks on critical infrastructure, government institutions, and industry.
- Description: Doctor of Philosophy
- Authors: Black, Paul
- Date: 2020
- Type: Text , Thesis , PhD
- Full Text:
- Description: Malware attacks are a significant and frequently reported problem, adversely affecting the productivity of organisations and governments worldwide. The well-documented consequences of malware attacks include financial loss, data loss, reputation damage, infrastructure damage, theft of intellectual property, compromise of commercial negotiations, and national security risks. Mitiga-tion activities involve a significant amount of manual analysis. Therefore, there is a need for automated techniques for malware analysis to identify malicious behaviours. Research into automated techniques for malware analysis covers a wide range of activities. This thesis consists of a series of studies: an anal-ysis of banking malware families and their common behaviours, an emulated command and control environment for dynamic malware analysis, a technique to identify similar malware functions, and a technique for the detection of ransomware. An analysis of the nature of banking malware, its major malware families, behaviours, variants, and inter-relationships are provided in this thesis. In doing this, this research takes a broad view of malware analysis, starting with the implementation of the malicious behaviours through to detailed analysis using machine learning. The broad approach taken in this thesis differs from some other studies that approach malware research in a more abstract sense. A disadvantage of approaching malware research without domain knowledge, is that important methodology questions may not be considered. Large datasets of historical malware samples are available for countermea-sures research. However, due to the age of these samples, the original malware infrastructure is no longer available, often restricting malware operations to initialisation functions only. To address this absence, an emulated command and control environment is provided. This emulated environment provides full control of the malware, enabling the capabilities of the original in-the-wild operation, while enabling feature extraction for research purposes. A major focus of this thesis has been the development of a machine learn-ing function similarity method with a novel feature encoding that increases feature strength. This research develops techniques to demonstrate that the machine learning model trained on similarity features from one program can find similar functions in another, unrelated program. This finding can lead to the development of generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra. Further, this research examines the use of API call features for the identi-fication of ransomware and shows that a failure to consider malware analysis domain knowledge can lead to weaknesses in experimental design. In this case, we show that existing research has difficulty in discriminating between ransomware and benign cryptographic software. This thesis by publication, has developed techniques to advance the disci-pline of malware reverse engineering, in order to minimize harm due to cyber-attacks on critical infrastructure, government institutions, and industry.
- Description: Doctor of Philosophy
Assessing nitrate contamination risks in groundwater : a machine learning approach
- Awais, Muhammad, Aslam, Bilal, Maqsoom, Ahsen, Khalil, Umer, Imran, Muhammad
- Authors: Awais, Muhammad , Aslam, Bilal , Maqsoom, Ahsen , Khalil, Umer , Imran, Muhammad
- Date: 2021
- Type: Text , Journal article
- Relation: Applied Sciences (Switzerland) Vol. 11, no. 21 (2021), p.
- Full Text:
- Reviewed:
- Description: Groundwater is one of the primary sources for the daily water requirements of the masses, but it is subjected to contamination due to the pollutants, such as nitrate, percolating through the soil with water. Especially in built-up areas, groundwater vulnerability and contamination are of major concern, and require appropriate consideration. The present study develops a novel framework for assessing groundwater nitrate contamination risk for the area along the Karakoram Highway, which is a part of the China Pakistan Economic Corridor (CPEC) route in northern Pakistan. A groundwater vulnerability map was prepared using the DRASTIC model. The nitrate concentration data from a previous study were used to formulate the nitrate contamination map. Three machine learning (ML) models, i.e., Support Vector Machine (SVM), Multivariate Discriminant Analysis (MDA), and Boosted Regression Trees (BRT), were used to analyze the probability of groundwater contamination incidence. Furthermore, groundwater contamination probability maps were obtained utilizing the ensemble modeling approach. The models were calibrated and validated through calibration trials, using the area under the receiver operating characteristic curve method (AUC), where a minimum AUC threshold value of 80% was achieved. Results indicated the accuracy of the models to be in the range of 0.82–0.87. The final groundwater contamination risk map highlights that 34% of the area is moderately vulnerable to groundwater contamination, and 13% of the area is exposed to high groundwater contamination risk. The findings of this study can facilitate decision-making regarding the location of future built-up areas properly in order to mitigate the nitrate contamination that can further reduce the associated health risks. © 2021 by the authors. Licensee MDPI, Basel, Switzerland. **Please note that there are multiple authors for this article therefore only the name of the first 5 including Federation University Australia affiliate “Muhammad Imran” is provided in this record**
- Authors: Awais, Muhammad , Aslam, Bilal , Maqsoom, Ahsen , Khalil, Umer , Imran, Muhammad
- Date: 2021
- Type: Text , Journal article
- Relation: Applied Sciences (Switzerland) Vol. 11, no. 21 (2021), p.
- Full Text:
- Reviewed:
- Description: Groundwater is one of the primary sources for the daily water requirements of the masses, but it is subjected to contamination due to the pollutants, such as nitrate, percolating through the soil with water. Especially in built-up areas, groundwater vulnerability and contamination are of major concern, and require appropriate consideration. The present study develops a novel framework for assessing groundwater nitrate contamination risk for the area along the Karakoram Highway, which is a part of the China Pakistan Economic Corridor (CPEC) route in northern Pakistan. A groundwater vulnerability map was prepared using the DRASTIC model. The nitrate concentration data from a previous study were used to formulate the nitrate contamination map. Three machine learning (ML) models, i.e., Support Vector Machine (SVM), Multivariate Discriminant Analysis (MDA), and Boosted Regression Trees (BRT), were used to analyze the probability of groundwater contamination incidence. Furthermore, groundwater contamination probability maps were obtained utilizing the ensemble modeling approach. The models were calibrated and validated through calibration trials, using the area under the receiver operating characteristic curve method (AUC), where a minimum AUC threshold value of 80% was achieved. Results indicated the accuracy of the models to be in the range of 0.82–0.87. The final groundwater contamination risk map highlights that 34% of the area is moderately vulnerable to groundwater contamination, and 13% of the area is exposed to high groundwater contamination risk. The findings of this study can facilitate decision-making regarding the location of future built-up areas properly in order to mitigate the nitrate contamination that can further reduce the associated health risks. © 2021 by the authors. Licensee MDPI, Basel, Switzerland. **Please note that there are multiple authors for this article therefore only the name of the first 5 including Federation University Australia affiliate “Muhammad Imran” is provided in this record**
The gene of scientific success
- Kong, Xiangjie, Zhang, Jun, Zhang, Da, Bu, Yi, Ding, Ying, Xia, Feng
- Authors: Kong, Xiangjie , Zhang, Jun , Zhang, Da , Bu, Yi , Ding, Ying , Xia, Feng
- Date: 2020
- Type: Text , Journal article
- Relation: ACM Transactions on Knowledge Discovery from Data Vol. 14, no. 4 (2020), p.
- Full Text:
- Reviewed:
- Description: This article elaborates how to identify and evaluate causal factors to improve scientific impact. Currently, analyzing scientific impact can be beneficial to various academic activities including funding application, mentor recommendation, discovering potential cooperators, and the like. It is universally acknowledged that high-impact scholars often have more opportunities to receive awards as an encouragement for their hard work. Therefore, scholars spend great efforts in making scientific achievements and improving scientific impact during their academic life. However, what are the determinate factors that control scholars' academic success? The answer to this question can help scholars conduct their research more efficiently. Under this consideration, our article presents and analyzes the causal factors that are crucial for scholars' academic success. We first propose five major factors including article-centered factors, author-centered factors, venue-centered factors, institution-centered factors, and temporal factors. Then, we apply recent advanced machine learning algorithms and jackknife method to assess the importance of each causal factor. Our empirical results show that author-centered and article-centered factors have the highest relevancy to scholars' future success in the computer science area. Additionally, we discover an interesting phenomenon that the h-index of scholars within the same institution or university are actually very close to each other. © 2020 ACM.
- Authors: Kong, Xiangjie , Zhang, Jun , Zhang, Da , Bu, Yi , Ding, Ying , Xia, Feng
- Date: 2020
- Type: Text , Journal article
- Relation: ACM Transactions on Knowledge Discovery from Data Vol. 14, no. 4 (2020), p.
- Full Text:
- Reviewed:
- Description: This article elaborates how to identify and evaluate causal factors to improve scientific impact. Currently, analyzing scientific impact can be beneficial to various academic activities including funding application, mentor recommendation, discovering potential cooperators, and the like. It is universally acknowledged that high-impact scholars often have more opportunities to receive awards as an encouragement for their hard work. Therefore, scholars spend great efforts in making scientific achievements and improving scientific impact during their academic life. However, what are the determinate factors that control scholars' academic success? The answer to this question can help scholars conduct their research more efficiently. Under this consideration, our article presents and analyzes the causal factors that are crucial for scholars' academic success. We first propose five major factors including article-centered factors, author-centered factors, venue-centered factors, institution-centered factors, and temporal factors. Then, we apply recent advanced machine learning algorithms and jackknife method to assess the importance of each causal factor. Our empirical results show that author-centered and article-centered factors have the highest relevancy to scholars' future success in the computer science area. Additionally, we discover an interesting phenomenon that the h-index of scholars within the same institution or university are actually very close to each other. © 2020 ACM.
Data-driven computational social science : A survey
- Zhang, Jun, Wang, Wei, Xia, Feng, Lin, Yu-Ru, Tong, Hanghang
- Authors: Zhang, Jun , Wang, Wei , Xia, Feng , Lin, Yu-Ru , Tong, Hanghang
- Date: 2020
- Type: Text , Journal article
- Relation: Big Data Research Vol. 21, no. (2020), p. 1-22
- Full Text:
- Reviewed:
- Description: Social science concerns issues on individuals, relationships, and the whole society. The complexity of research topics in social science makes it the amalgamation of multiple disciplines, such as economics, political science, and sociology, etc. For centuries, scientists have conducted many studies to understand the mechanisms of the society. However, due to the limitations of traditional research methods, there exist many critical social issues to be explored. To solve those issues, computational social science emerges due to the rapid advancements of computation technologies and the profound studies on social science. With the aids of the advanced research techniques, various kinds of data from diverse areas can be acquired nowadays, and they can help us look into social problems with a new eye. As a result, utilizing various data to reveal issues derived from computational social science area has attracted more and more attentions. In this paper, to the best of our knowledge, we present a survey on datadriven computational social science for the first time which primarily focuses on reviewing application domains involving human dynamics. The state-of-the-art research on human dynamics is reviewed from three aspects: individuals, relationships, and collectives. Specifically, the research methodologies used to address research challenges in aforementioned application domains are summarized. In addition, some important open challenges with respect to both emerging research topics and research methods are discussed.
- Authors: Zhang, Jun , Wang, Wei , Xia, Feng , Lin, Yu-Ru , Tong, Hanghang
- Date: 2020
- Type: Text , Journal article
- Relation: Big Data Research Vol. 21, no. (2020), p. 1-22
- Full Text:
- Reviewed:
- Description: Social science concerns issues on individuals, relationships, and the whole society. The complexity of research topics in social science makes it the amalgamation of multiple disciplines, such as economics, political science, and sociology, etc. For centuries, scientists have conducted many studies to understand the mechanisms of the society. However, due to the limitations of traditional research methods, there exist many critical social issues to be explored. To solve those issues, computational social science emerges due to the rapid advancements of computation technologies and the profound studies on social science. With the aids of the advanced research techniques, various kinds of data from diverse areas can be acquired nowadays, and they can help us look into social problems with a new eye. As a result, utilizing various data to reveal issues derived from computational social science area has attracted more and more attentions. In this paper, to the best of our knowledge, we present a survey on datadriven computational social science for the first time which primarily focuses on reviewing application domains involving human dynamics. The state-of-the-art research on human dynamics is reviewed from three aspects: individuals, relationships, and collectives. Specifically, the research methodologies used to address research challenges in aforementioned application domains are summarized. In addition, some important open challenges with respect to both emerging research topics and research methods are discussed.
Cyberattacks detection in iot-based smart city applications using machine learning techniques
- Rashid, Md Mamunur, Kamruzzaman, Joarder, Hassan, Mohammad, Imam, Tassadduq, Gordon, Steven
- Authors: Rashid, Md Mamunur , Kamruzzaman, Joarder , Hassan, Mohammad , Imam, Tassadduq , Gordon, Steven
- Date: 2020
- Type: Text , Journal article
- Relation: International Journal of Environmental Research and Public Health Vol. 17, no. 24 (2020), p. 1-21
- Full Text:
- Reviewed:
- Description: In recent years, the widespread deployment of the Internet of Things (IoT) applications has contributed to the development of smart cities. A smart city utilizes IoT-enabled technologies, communications and applications to maximize operational efficiency and enhance both the service providers’ quality of services and people’s wellbeing and quality of life. With the growth of smart city networks, however, comes the increased risk of cybersecurity threats and attacks. IoT devices within a smart city network are connected to sensors linked to large cloud servers and are exposed to malicious attacks and threats. Thus, it is important to devise approaches to prevent such attacks and protect IoT devices from failure. In this paper, we explore an attack and anomaly detection technique based on machine learning algorithms (LR, SVM, DT, RF, ANN and KNN) to defend against and mitigate IoT cybersecurity threats in a smart city. Contrary to existing works that have focused on single classifiers, we also explore ensemble methods such as bagging, boosting and stacking to enhance the performance of the detection system. Additionally, we consider an integration of feature selection, cross-validation and multi-class classification for the discussed domain, which has not been well considered in the existing literature. Experimental results with the recent attack dataset demonstrate that the proposed technique can effectively identify cyberattacks and the stacking ensemble model outperforms comparable models in terms of accuracy, precision, recall and F1-Score, implying the promise of stacking in this domain. © 2020 by the authors. Licensee MDPI, Basel, Switzerland.
- Authors: Rashid, Md Mamunur , Kamruzzaman, Joarder , Hassan, Mohammad , Imam, Tassadduq , Gordon, Steven
- Date: 2020
- Type: Text , Journal article
- Relation: International Journal of Environmental Research and Public Health Vol. 17, no. 24 (2020), p. 1-21
- Full Text:
- Reviewed:
- Description: In recent years, the widespread deployment of the Internet of Things (IoT) applications has contributed to the development of smart cities. A smart city utilizes IoT-enabled technologies, communications and applications to maximize operational efficiency and enhance both the service providers’ quality of services and people’s wellbeing and quality of life. With the growth of smart city networks, however, comes the increased risk of cybersecurity threats and attacks. IoT devices within a smart city network are connected to sensors linked to large cloud servers and are exposed to malicious attacks and threats. Thus, it is important to devise approaches to prevent such attacks and protect IoT devices from failure. In this paper, we explore an attack and anomaly detection technique based on machine learning algorithms (LR, SVM, DT, RF, ANN and KNN) to defend against and mitigate IoT cybersecurity threats in a smart city. Contrary to existing works that have focused on single classifiers, we also explore ensemble methods such as bagging, boosting and stacking to enhance the performance of the detection system. Additionally, we consider an integration of feature selection, cross-validation and multi-class classification for the discussed domain, which has not been well considered in the existing literature. Experimental results with the recent attack dataset demonstrate that the proposed technique can effectively identify cyberattacks and the stacking ensemble model outperforms comparable models in terms of accuracy, precision, recall and F1-Score, implying the promise of stacking in this domain. © 2020 by the authors. Licensee MDPI, Basel, Switzerland.