Experimental investigation of three machine learning algorithms for ITS dataset
- Yearwood, John, Kang, Byeongho, Kelarev, Andrei
- Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at First International Conference, FGIT 2009, Future Generation Information Technology, Jeju Island, Korea : 10th-12th December 2009 Vol. 5899, p. 308-316
- Full Text:
- Description: The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their ability to achieve agreement with classes published in the biologi cal literature before. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel machine learning ap proaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the discrete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are efficient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified version of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.
- Description: 2003007844
- Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at First International Conference, FGIT 2009, Future Generation Information Technology, Jeju Island, Korea : 10th-12th December 2009 Vol. 5899, p. 308-316
- Full Text:
- Description: The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their ability to achieve agreement with classes published in the biologi cal literature before. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel machine learning ap proaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the discrete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are efficient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified version of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.
- Description: 2003007844
Zero-day malware detection based on supervised learning algorithms of API call signatures
- Alazab, Mamoun, Venkatraman, Sitalakshmi, Watters, Paul, Alazab, Moutaz
- Authors: Alazab, Mamoun , Venkatraman, Sitalakshmi , Watters, Paul , Alazab, Moutaz
- Date: 2011
- Type: Text , Conference proceedings
- Full Text:
- Description: Zero-day or unknown malware are created using code obfuscation techniques that can modify the parent code to produce offspring copies which have the same functionality but with different signatures. Current techniques reported in literature lack the capability of detecting zero-day malware with the required accuracy and efficiency. In this paper, we have proposed and evaluated a novel method of employing several data mining techniques to detect and classify zero-day malware with high levels of accuracy and efficiency based on the frequency of Windows API calls. This paper describes the methodology employed for the collection of large data sets to train the classifiers, and analyses the performance results of the various data mining algorithms adopted for the study using a fully automated tool developed in this research to conduct the various experimental investigations and evaluation. Through the performance results of these algorithms from our experimental analysis, we are able to evaluate and discuss the advantages of one data mining algorithm over the other for accurately detecting zero-day malware successfully. The data mining framework employed in this research learns through analysing the behavior of existing malicious and benign codes in large datasets. We have employed robust classifiers, namely Naïve Bayes (NB) Algorithm, k-Nearest Neighbor (kNN) Algorithm, Sequential Minimal Optimization (SMO) Algorithm with 4 differents kernels (SMO - Normalized PolyKernel, SMO - PolyKernel, SMO - Puk, and SMO- Radial Basis Function (RBF)), Backpropagation Neural Networks Algorithm, and J48 decision tree and have evaluated their performance. Overall, the automated data mining system implemented for this study has achieved high true positive (TP) rate of more than 98.5%, and low false positive (FP) rate of less than 0.025, which has not been achieved in literature so far. This is much higher than the required commercial acceptance level indicating that our novel technique is a major leap forward in detecting zero-day malware. This paper also offers future directions for researchers in exploring different aspects of obfuscations that are affecting the IT world today. © 2011, Australian Computer Society, Inc.
- Description: 2003009506
- Authors: Alazab, Mamoun , Venkatraman, Sitalakshmi , Watters, Paul , Alazab, Moutaz
- Date: 2011
- Type: Text , Conference proceedings
- Full Text:
- Description: Zero-day or unknown malware are created using code obfuscation techniques that can modify the parent code to produce offspring copies which have the same functionality but with different signatures. Current techniques reported in literature lack the capability of detecting zero-day malware with the required accuracy and efficiency. In this paper, we have proposed and evaluated a novel method of employing several data mining techniques to detect and classify zero-day malware with high levels of accuracy and efficiency based on the frequency of Windows API calls. This paper describes the methodology employed for the collection of large data sets to train the classifiers, and analyses the performance results of the various data mining algorithms adopted for the study using a fully automated tool developed in this research to conduct the various experimental investigations and evaluation. Through the performance results of these algorithms from our experimental analysis, we are able to evaluate and discuss the advantages of one data mining algorithm over the other for accurately detecting zero-day malware successfully. The data mining framework employed in this research learns through analysing the behavior of existing malicious and benign codes in large datasets. We have employed robust classifiers, namely Naïve Bayes (NB) Algorithm, k-Nearest Neighbor (kNN) Algorithm, Sequential Minimal Optimization (SMO) Algorithm with 4 differents kernels (SMO - Normalized PolyKernel, SMO - PolyKernel, SMO - Puk, and SMO- Radial Basis Function (RBF)), Backpropagation Neural Networks Algorithm, and J48 decision tree and have evaluated their performance. Overall, the automated data mining system implemented for this study has achieved high true positive (TP) rate of more than 98.5%, and low false positive (FP) rate of less than 0.025, which has not been achieved in literature so far. This is much higher than the required commercial acceptance level indicating that our novel technique is a major leap forward in detecting zero-day malware. This paper also offers future directions for researchers in exploring different aspects of obfuscations that are affecting the IT world today. © 2011, Australian Computer Society, Inc.
- Description: 2003009506
An empirical comparison of two common multiobjective reinforcement learning algorithms
- Issabekov, Rustam, Vamplew, Peter
- Authors: Issabekov, Rustam , Vamplew, Peter
- Date: 2012
- Type: Text , Conference paper
- Relation: 25th Australasian Joint Conference on Artificial Intelligence, AI 2012 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 7691 LNAI, p. 626-636
- Full Text:
- Reviewed:
- Description: In this paper we provide empirical data of the performance of the two most commonly used multiobjective reinforcement learning algorithms against a set of benchmarks. First, we describe a methodology that was used in this paper. Then, we carefully describe the details and properties of the proposed problems and how those properties influence the behavior of tested algorithms. We also introduce a testing framework that will significantly improve future empirical comparisons of multiobjective reinforcement learning algorithms. We hope this testing environment eventually becomes a central repository of test problems and algorithms The empirical results clearly identify features of the test problems which impact on the performance of each algorithm, demonstrating the utility of empirical testing of algorithms on problems with known characteristics. © 2012 Springer-Verlag.
- Description: 2003010655
- Authors: Issabekov, Rustam , Vamplew, Peter
- Date: 2012
- Type: Text , Conference paper
- Relation: 25th Australasian Joint Conference on Artificial Intelligence, AI 2012 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 7691 LNAI, p. 626-636
- Full Text:
- Reviewed:
- Description: In this paper we provide empirical data of the performance of the two most commonly used multiobjective reinforcement learning algorithms against a set of benchmarks. First, we describe a methodology that was used in this paper. Then, we carefully describe the details and properties of the proposed problems and how those properties influence the behavior of tested algorithms. We also introduce a testing framework that will significantly improve future empirical comparisons of multiobjective reinforcement learning algorithms. We hope this testing environment eventually becomes a central repository of test problems and algorithms The empirical results clearly identify features of the test problems which impact on the performance of each algorithm, demonstrating the utility of empirical testing of algorithms on problems with known characteristics. © 2012 Springer-Verlag.
- Description: 2003010655
Algorithm development for the non-destructive testing of structural damage
- Noori Hoshyar, Azadeh, Rashidi, Maria, Liyanapathirana, Ranjith, Samali, Bijan
- Authors: Noori Hoshyar, Azadeh , Rashidi, Maria , Liyanapathirana, Ranjith , Samali, Bijan
- Date: 2019
- Type: Text , Journal article
- Relation: Applied sciences Vol. 9, no. 14 (2019), p. 2810
- Full Text:
- Reviewed:
- Description: Monitoring of structures to identify types of damages that occur under loading is essential in practical applications of civil infrastructure. In this paper, we detect and visualize damage based on several non-destructive testing (NDT) methods. A machine learning (ML) approach based on the Support Vector Machine (SVM) method is developed to prevent misdirection of the event interpretation of what is happening in the material. The objective is to identify cracks in the early stages, to reduce the risk of failure in structures. Theoretical and experimental analyses are derived by computing the performance indicators on the smart aggregate (SA)-based sensor data for concrete and reinforced-concrete (RC) beams. Validity assessment of the proposed indices was addressed through a comparative analysis with traditional SVM. The developed ML algorithms are shown to recognize cracks with a higher accuracy than the traditional SVM. Additionally, we propose different algorithms for microwave- or millimeter-wave imaging of steel plates, composite materials, and metal plates, to identify and visualize cracks. The proposed algorithm for steel plates is based on the gradient magnitude in four directions of an image, and is followed by the edge detection technique. Three algorithms were proposed for each of composite materials and metal plates, and are based on 2D fast Fourier transform (FFT) and hybrid fuzzy c-mean techniques, respectively. The proposed algorithms were able to recognize and visualize the cracking incurred in the structure more efficiently than the traditional techniques. The reported results are expected to be beneficial for NDT-based applications, particularly in civil engineering.
- Authors: Noori Hoshyar, Azadeh , Rashidi, Maria , Liyanapathirana, Ranjith , Samali, Bijan
- Date: 2019
- Type: Text , Journal article
- Relation: Applied sciences Vol. 9, no. 14 (2019), p. 2810
- Full Text:
- Reviewed:
- Description: Monitoring of structures to identify types of damages that occur under loading is essential in practical applications of civil infrastructure. In this paper, we detect and visualize damage based on several non-destructive testing (NDT) methods. A machine learning (ML) approach based on the Support Vector Machine (SVM) method is developed to prevent misdirection of the event interpretation of what is happening in the material. The objective is to identify cracks in the early stages, to reduce the risk of failure in structures. Theoretical and experimental analyses are derived by computing the performance indicators on the smart aggregate (SA)-based sensor data for concrete and reinforced-concrete (RC) beams. Validity assessment of the proposed indices was addressed through a comparative analysis with traditional SVM. The developed ML algorithms are shown to recognize cracks with a higher accuracy than the traditional SVM. Additionally, we propose different algorithms for microwave- or millimeter-wave imaging of steel plates, composite materials, and metal plates, to identify and visualize cracks. The proposed algorithm for steel plates is based on the gradient magnitude in four directions of an image, and is followed by the edge detection technique. Three algorithms were proposed for each of composite materials and metal plates, and are based on 2D fast Fourier transform (FFT) and hybrid fuzzy c-mean techniques, respectively. The proposed algorithms were able to recognize and visualize the cracking incurred in the structure more efficiently than the traditional techniques. The reported results are expected to be beneficial for NDT-based applications, particularly in civil engineering.
A probabilistic reverse power flows scenario analysis framework
- Demazy, Antonin, Alpcan, Tansu, Mareels, Iven
- Authors: Demazy, Antonin , Alpcan, Tansu , Mareels, Iven
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE open access journal of power and energy Vol. 7, no. (2020), p. 524-532
- Full Text:
- Reviewed:
- Description: Distributed Energy Resources (DER), mainly residential solar PV, are embedded deep within the power distribution network and their adoption is fast increasing globally. As more customers participate, these power generation units cause Reverse Power Flow (RPF) at the edge of the grid, directed upstream into the network, thus violating one of the traditional design principles for power networks. The effects of a single residential solar PV system is negligible, but as the adoption by end-consumers increases to high percentages, the aggregated effect is no longer negligible and must be considered in the design and configuration of power networks. This article proposes a framework that helps to predict the RPF intensity probability for any given scenario of DER penetration within the distribution network. The considered scenario parameters are the number and location of each residential DERs, their capacity and the daily net-load profiles. Classical simulation-based approach for this is not scalable as it relies on solving the load-flow equations for each individual scenario. The framework leverages machine learning techniques to make fast and precise RPF prediction within the network for each scenario. The framework enables the Distribution Network Service Providers (DNSPs) to assess DERs penetration scenarios at a granular level, derive and localise the RPF risks and assess the respective impacts on the installed assets for network planning purpose. The framework is illustrated with scenario analysis conducted on an IEEE 123 bus system and OpenDSS and shown that it can lead to multiple orders of magnitude savings in computational time while retaining an accuracy of 94% or above compared to classical brute force simulations.
- Authors: Demazy, Antonin , Alpcan, Tansu , Mareels, Iven
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE open access journal of power and energy Vol. 7, no. (2020), p. 524-532
- Full Text:
- Reviewed:
- Description: Distributed Energy Resources (DER), mainly residential solar PV, are embedded deep within the power distribution network and their adoption is fast increasing globally. As more customers participate, these power generation units cause Reverse Power Flow (RPF) at the edge of the grid, directed upstream into the network, thus violating one of the traditional design principles for power networks. The effects of a single residential solar PV system is negligible, but as the adoption by end-consumers increases to high percentages, the aggregated effect is no longer negligible and must be considered in the design and configuration of power networks. This article proposes a framework that helps to predict the RPF intensity probability for any given scenario of DER penetration within the distribution network. The considered scenario parameters are the number and location of each residential DERs, their capacity and the daily net-load profiles. Classical simulation-based approach for this is not scalable as it relies on solving the load-flow equations for each individual scenario. The framework leverages machine learning techniques to make fast and precise RPF prediction within the network for each scenario. The framework enables the Distribution Network Service Providers (DNSPs) to assess DERs penetration scenarios at a granular level, derive and localise the RPF risks and assess the respective impacts on the installed assets for network planning purpose. The framework is illustrated with scenario analysis conducted on an IEEE 123 bus system and OpenDSS and shown that it can lead to multiple orders of magnitude savings in computational time while retaining an accuracy of 94% or above compared to classical brute force simulations.
Automated segmentation of mouse OCT volumes (ASiMOV): Validation & clinical study of a light damage model
- Antony, Bhavna, Kim, Byung-Jin, Lang, Andrew, Carass, Aaron, Prince, Jerry, Zack, Donald
- Authors: Antony, Bhavna , Kim, Byung-Jin , Lang, Andrew , Carass, Aaron , Prince, Jerry , Zack, Donald
- Date: 2017
- Type: Text , Journal article
- Relation: PLoS One Vol. 12, no. 8 (2017), p. e0181059-e0181059
- Full Text:
- Reviewed:
- Description: The use of spectral-domain optical coherence tomography (SD-OCT) is becoming commonplace for the in vivo longitudinal study of murine models of ophthalmic disease. Longitudinal studies, however, generate large quantities of data, the manual analysis of which is very challenging due to the time-consuming nature of generating delineations. Thus, it is of importance that automated algorithms be developed to facilitate accurate and timely analysis of these large datasets. Furthermore, as the models target a variety of diseases, the associated structural changes can also be extremely disparate. For instance, in the light damage (LD) model, which is frequently used to study photoreceptor degeneration, the outer retina appears dramatically different from the normal retina. To address these concerns, we have developed a flexible graph-based algorithm for the automated segmentation of mouse OCT volumes (ASiMOV). This approach incorporates a machine-learning component that can be easily trained for different disease models. To validate ASiMOV, the automated results were compared to manual delineations obtained from three raters on healthy and BALB/cJ mice post LD. It was also used to study a longitudinal LD model, where five control and five LD mice were imaged at four timepoints post LD. The total retinal thickness and the outer retina (comprising the outer nuclear layer, and inner and outer segments of the photoreceptors) were unchanged the day after the LD, but subsequently thinned significantly (p < 0.01). The retinal nerve fiber-ganglion cell complex and the inner plexiform layers, however, remained unchanged for the duration of the study.
- Authors: Antony, Bhavna , Kim, Byung-Jin , Lang, Andrew , Carass, Aaron , Prince, Jerry , Zack, Donald
- Date: 2017
- Type: Text , Journal article
- Relation: PLoS One Vol. 12, no. 8 (2017), p. e0181059-e0181059
- Full Text:
- Reviewed:
- Description: The use of spectral-domain optical coherence tomography (SD-OCT) is becoming commonplace for the in vivo longitudinal study of murine models of ophthalmic disease. Longitudinal studies, however, generate large quantities of data, the manual analysis of which is very challenging due to the time-consuming nature of generating delineations. Thus, it is of importance that automated algorithms be developed to facilitate accurate and timely analysis of these large datasets. Furthermore, as the models target a variety of diseases, the associated structural changes can also be extremely disparate. For instance, in the light damage (LD) model, which is frequently used to study photoreceptor degeneration, the outer retina appears dramatically different from the normal retina. To address these concerns, we have developed a flexible graph-based algorithm for the automated segmentation of mouse OCT volumes (ASiMOV). This approach incorporates a machine-learning component that can be easily trained for different disease models. To validate ASiMOV, the automated results were compared to manual delineations obtained from three raters on healthy and BALB/cJ mice post LD. It was also used to study a longitudinal LD model, where five control and five LD mice were imaged at four timepoints post LD. The total retinal thickness and the outer retina (comprising the outer nuclear layer, and inner and outer segments of the photoreceptors) were unchanged the day after the LD, but subsequently thinned significantly (p < 0.01). The retinal nerve fiber-ganglion cell complex and the inner plexiform layers, however, remained unchanged for the duration of the study.
Comparative analysis of machine and deep learning models for soil properties prediction from hyperspectral visual band
- Datta, Dristi, Paul, Manoranjan, Murshed, Manzur, Teng, Shyh Wei, Schmidtke, Leigh
- Authors: Datta, Dristi , Paul, Manoranjan , Murshed, Manzur , Teng, Shyh Wei , Schmidtke, Leigh
- Date: 2023
- Type: Text , Journal article
- Relation: Environments Vol. 10, no. 5 (2023), p. 77
- Full Text:
- Reviewed:
- Description: Estimating various properties of soil, including moisture, carbon, and nitrogen, is crucial for studying their correlation with plant health and food production. However, conventional methods such as oven-drying and chemical analysis are laborious, expensive, and only feasible for a limited land area. With the advent of remote sensing technologies like multi/hyperspectral imaging, it is now possible to predict soil properties non-invasive and cost-effectively for a large expanse of bare land. Recent research shows the possibility of predicting those soil contents from a wide range of hyperspectral data using good prediction algorithms. However, these kinds of hyperspectral sensors are expensive and not widely available. Therefore, this paper investigates different machine and deep learning techniques to predict soil nutrient properties using only the red (R), green (G), and blue (B) bands data to propose a suitable machine/deep learning model that can be used as a rapid soil test. Another objective of this research is to observe and compare the prediction accuracy in three cases i. hyperspectral band ii. full spectrum of the visual band, and iii. three-channel of RGB band and provide a guideline to the user on which spectrum information they should use to predict those soil properties. The outcome of this research helps to develop a mobile application that is easy to use for a quick soil test. This research also explores learning-based algorithms with significant feature combinations and their performance comparisons in predicting soil properties from visual band data. For this, we also explore the impact of dimensional reduction (i.e., principal component analysis) and transformations (i.e., empirical mode decomposition) of features. The results show that the proposed model can comparably predict the soil contents from the three-channel RGB data.
- Authors: Datta, Dristi , Paul, Manoranjan , Murshed, Manzur , Teng, Shyh Wei , Schmidtke, Leigh
- Date: 2023
- Type: Text , Journal article
- Relation: Environments Vol. 10, no. 5 (2023), p. 77
- Full Text:
- Reviewed:
- Description: Estimating various properties of soil, including moisture, carbon, and nitrogen, is crucial for studying their correlation with plant health and food production. However, conventional methods such as oven-drying and chemical analysis are laborious, expensive, and only feasible for a limited land area. With the advent of remote sensing technologies like multi/hyperspectral imaging, it is now possible to predict soil properties non-invasive and cost-effectively for a large expanse of bare land. Recent research shows the possibility of predicting those soil contents from a wide range of hyperspectral data using good prediction algorithms. However, these kinds of hyperspectral sensors are expensive and not widely available. Therefore, this paper investigates different machine and deep learning techniques to predict soil nutrient properties using only the red (R), green (G), and blue (B) bands data to propose a suitable machine/deep learning model that can be used as a rapid soil test. Another objective of this research is to observe and compare the prediction accuracy in three cases i. hyperspectral band ii. full spectrum of the visual band, and iii. three-channel of RGB band and provide a guideline to the user on which spectrum information they should use to predict those soil properties. The outcome of this research helps to develop a mobile application that is easy to use for a quick soil test. This research also explores learning-based algorithms with significant feature combinations and their performance comparisons in predicting soil properties from visual band data. For this, we also explore the impact of dimensional reduction (i.e., principal component analysis) and transformations (i.e., empirical mode decomposition) of features. The results show that the proposed model can comparably predict the soil contents from the three-channel RGB data.
- «
- ‹
- 1
- ›
- »