Multi-source cyber-attacks detection using machine learning
- Taheri, Sona, Gondal, Iqbal, Bagirov, Adil, Harkness, Greg, Brown, Simon, Chi, Chihung
- Authors: Taheri, Sona , Gondal, Iqbal , Bagirov, Adil , Harkness, Greg , Brown, Simon , Chi, Chihung
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne, Australia; 13th-15th February 2019 Vol. 2019-February, p. 1167-1172
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has significantly increased the number of devices connected to the Internet ranging from sensors to multi-source data information. As the IoT continues to evolve with new technologies number of threats and attacks against IoT devices are on the increase. Analyzing and detecting these attacks originating from different sources needs machine learning models. These models provide proactive solutions for detecting attacks and their sources. In this paper, we propose to apply a supervised machine learning classification technique to identify cyber-attacks from each source. More precisely, we apply the incremental piecewise linear classifier that constructs boundary between sources/classes incrementally starting with one hyperplane and adding more hyperplanes at each iteration. The algorithm terminates when no further significant improvement of the separation of sources/classes is possible. The construction and usage of piecewise linear boundaries allows us to avoid any possible overfitting. We apply the incremental piecewise linear classifier on the multi-source real world cyber security data set to identify cyber-attacks and their sources.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
- Authors: Taheri, Sona , Gondal, Iqbal , Bagirov, Adil , Harkness, Greg , Brown, Simon , Chi, Chihung
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne, Australia; 13th-15th February 2019 Vol. 2019-February, p. 1167-1172
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has significantly increased the number of devices connected to the Internet ranging from sensors to multi-source data information. As the IoT continues to evolve with new technologies number of threats and attacks against IoT devices are on the increase. Analyzing and detecting these attacks originating from different sources needs machine learning models. These models provide proactive solutions for detecting attacks and their sources. In this paper, we propose to apply a supervised machine learning classification technique to identify cyber-attacks from each source. More precisely, we apply the incremental piecewise linear classifier that constructs boundary between sources/classes incrementally starting with one hyperplane and adding more hyperplanes at each iteration. The algorithm terminates when no further significant improvement of the separation of sources/classes is possible. The construction and usage of piecewise linear boundaries allows us to avoid any possible overfitting. We apply the incremental piecewise linear classifier on the multi-source real world cyber security data set to identify cyber-attacks and their sources.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
Analysis of Classifiers for Prediction of Type II Diabetes Mellitus
- Barhate, Rahul, Kulkarni, Pradnya
- Authors: Barhate, Rahul , Kulkarni, Pradnya
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 4th International Conference on Computing, Communication Control and Automation, ICCUBEA 2018
- Full Text:
- Reviewed:
- Description: Diabetes mellitus is a chronic disease and a health challenge worldwide. According to the International Diabetes Federation, 451 million people across the globe have diabetes, with this number anticipated to rise up to 693 million people by 2045. It has been shown that 80% of the complications arising from type II diabetes can be prevented or delayed by early identification of the people who are at risk. Diabetes is difficult to diagnose in the early stages as its symptoms grow subtly and gradually. In a majority of the cases, the patients remain undiagnosed until they are admitted for a heart attack or begin to lose their sight. This paper analyzes the different classification algorithms based on a patient's health history to aid doctors identify the presence of as well as promote early diagnosis and treatment. The experiments were conducted on Pima Indian Diabetes data set. Various classifiers used include K Nearest Neighbors, Logistic Regression, Decision Trees, Random Forest, Gradient Boosting, Support Vector Machine and Neural Network. Results demonstrate that Random Forests performed well on the data set giving an accuracy of 79.7%. © 2018 IEEE.
- Description: E1
- Authors: Barhate, Rahul , Kulkarni, Pradnya
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 4th International Conference on Computing, Communication Control and Automation, ICCUBEA 2018
- Full Text:
- Reviewed:
- Description: Diabetes mellitus is a chronic disease and a health challenge worldwide. According to the International Diabetes Federation, 451 million people across the globe have diabetes, with this number anticipated to rise up to 693 million people by 2045. It has been shown that 80% of the complications arising from type II diabetes can be prevented or delayed by early identification of the people who are at risk. Diabetes is difficult to diagnose in the early stages as its symptoms grow subtly and gradually. In a majority of the cases, the patients remain undiagnosed until they are admitted for a heart attack or begin to lose their sight. This paper analyzes the different classification algorithms based on a patient's health history to aid doctors identify the presence of as well as promote early diagnosis and treatment. The experiments were conducted on Pima Indian Diabetes data set. Various classifiers used include K Nearest Neighbors, Logistic Regression, Decision Trees, Random Forest, Gradient Boosting, Support Vector Machine and Neural Network. Results demonstrate that Random Forests performed well on the data set giving an accuracy of 79.7%. © 2018 IEEE.
- Description: E1
Improving deep forest by confidence screening
- Pang, Ming, Ting, Kaiming, Zhao, Peng, Zhou, Zhi-Hua
- Authors: Pang, Ming , Ting, Kaiming , Zhao, Peng , Zhou, Zhi-Hua
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 2018 Ieee International Conference on Data Mining; Singapore, Singapore; 17th-20th November 2018 p. 1194-1199
- Full Text:
- Reviewed:
- Description: Most studies about deep learning are based on neural network models, where many layers of parameterized nonlinear differentiable modules are trained by backpropagation. Recently, it has been shown that deep learning can also be realized by non-differentiable modules without backpropagation training called deep forest. The developed representation learning process is based on a cascade of cascades of decision tree forests, where the high memory requirement and the high time cost inhibit the training of large models. In this paper, we propose a simple yet effective approach to improve the efficiency of deep forest. The key idea is to pass the instances with high confidence directly to the final stage rather than passing through all the levels. We also provide a theoretical analysis suggesting a means to vary the model complexity from low to high as the level increases in the cascade, which further reduces the memory requirement and time cost. Our experiments show that the proposed approach achieves highly competitive predictive performance with significantly reduced time cost and memory requirement by up to one order of magnitude.
- Authors: Pang, Ming , Ting, Kaiming , Zhao, Peng , Zhou, Zhi-Hua
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 2018 Ieee International Conference on Data Mining; Singapore, Singapore; 17th-20th November 2018 p. 1194-1199
- Full Text:
- Reviewed:
- Description: Most studies about deep learning are based on neural network models, where many layers of parameterized nonlinear differentiable modules are trained by backpropagation. Recently, it has been shown that deep learning can also be realized by non-differentiable modules without backpropagation training called deep forest. The developed representation learning process is based on a cascade of cascades of decision tree forests, where the high memory requirement and the high time cost inhibit the training of large models. In this paper, we propose a simple yet effective approach to improve the efficiency of deep forest. The key idea is to pass the instances with high confidence directly to the final stage rather than passing through all the levels. We also provide a theoretical analysis suggesting a means to vary the model complexity from low to high as the level increases in the cascade, which further reduces the memory requirement and time cost. Our experiments show that the proposed approach achieves highly competitive predictive performance with significantly reduced time cost and memory requirement by up to one order of magnitude.
Progressive data stream mining and transaction classification for workload-aware incremental database repartitioning
- Kamal, Joarder, Murshed, Manzur, Gaber, Mohamed
- Authors: Kamal, Joarder , Murshed, Manzur , Gaber, Mohamed
- Date: 2014
- Type: Text , Conference proceedings
- Relation: IEEE/ACM International Symposium on Big Data Computing, BDC 2014; London, United Kingdom; 8th-11th December 2014; p. 8-15
- Full Text:
- Reviewed:
- Description: Minimising the impact of distributed transactions (DTs) in a shared-nothing distributed database is extremely challenging for transactional workloads. With dynamic workload nature and rapid growth in data volume the underlying database requires incremental repartitioning to maintain acceptable level of DTs and data load balance with minimum physical data migrations. In a workload-aware repartitioning scheme transactional workload is modelled as graph or hyper graph, and subsequently perform k-way min-cut clustering guaranteeing minimum edge cuts can reduce the impact of DTs significantly by mapping the workload clusters into logical database partitions. However, without exploring the inherent workload characteristics, the overall processing and computing times for large-scale workload networks increase in polynomial orders. In this paper, a workload-aware incremental database repartitioning technique is proposed, which effectively exploits proactive transaction classification and workload stream mining techniques. Workload batches are modelled in graph, hyper graph, and compressed hyper graph then repartitioned to produce a fresh tuple-to-partition data migration plan for every incremental cycle. Experimental studies in a simulated TPC-C environment demonstrate that the proposed model can be effectively adopted in managing rapid data growth and dynamic workloads, thus progressively reduce the overall processing time required to operate over the workload networks.
- Authors: Kamal, Joarder , Murshed, Manzur , Gaber, Mohamed
- Date: 2014
- Type: Text , Conference proceedings
- Relation: IEEE/ACM International Symposium on Big Data Computing, BDC 2014; London, United Kingdom; 8th-11th December 2014; p. 8-15
- Full Text:
- Reviewed:
- Description: Minimising the impact of distributed transactions (DTs) in a shared-nothing distributed database is extremely challenging for transactional workloads. With dynamic workload nature and rapid growth in data volume the underlying database requires incremental repartitioning to maintain acceptable level of DTs and data load balance with minimum physical data migrations. In a workload-aware repartitioning scheme transactional workload is modelled as graph or hyper graph, and subsequently perform k-way min-cut clustering guaranteeing minimum edge cuts can reduce the impact of DTs significantly by mapping the workload clusters into logical database partitions. However, without exploring the inherent workload characteristics, the overall processing and computing times for large-scale workload networks increase in polynomial orders. In this paper, a workload-aware incremental database repartitioning technique is proposed, which effectively exploits proactive transaction classification and workload stream mining techniques. Workload batches are modelled in graph, hyper graph, and compressed hyper graph then repartitioned to produce a fresh tuple-to-partition data migration plan for every incremental cycle. Experimental studies in a simulated TPC-C environment demonstrate that the proposed model can be effectively adopted in managing rapid data growth and dynamic workloads, thus progressively reduce the overall processing time required to operate over the workload networks.
Hybrid technique for colour image classification and efficient retrieval based on fuzzy logic and neural networks
- Fernando, Ranisha, Kulkarni, Siddhivinayak
- Authors: Fernando, Ranisha , Kulkarni, Siddhivinayak
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Developments in the technology and the Internet have led to increase in number of digital images and videos. Thousands of images are added to WWW every day. To retrieve the specific images efficiently from database or from Internet is becoming a challenge now a day. As a result, the necessity of retrieving images has emerged to be important to various professional areas. This paper proposes a novel fuzzy approach to classify the colour images based on their content, to pose a query in terms of natural language and fuse the queries based on neural networks for fast and efficient retrieval. Number of experiments was conducted for classification and retrieval of images on sets of images and promising results were obtained. The results were analysed and compared with other similar image retrieval system. © 2012 IEEE.
- Authors: Fernando, Ranisha , Kulkarni, Siddhivinayak
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Developments in the technology and the Internet have led to increase in number of digital images and videos. Thousands of images are added to WWW every day. To retrieve the specific images efficiently from database or from Internet is becoming a challenge now a day. As a result, the necessity of retrieving images has emerged to be important to various professional areas. This paper proposes a novel fuzzy approach to classify the colour images based on their content, to pose a query in terms of natural language and fuse the queries based on neural networks for fast and efficient retrieval. Number of experiments was conducted for classification and retrieval of images on sets of images and promising results were obtained. The results were analysed and compared with other similar image retrieval system. © 2012 IEEE.
Feature selection using misclassification counts
- Bagirov, Adil, Yatsko, Andrew, Stranieri, Andrew
- Authors: Bagirov, Adil , Yatsko, Andrew , Stranieri, Andrew
- Date: 2011
- Type: Conference proceedings , Unpublished work
- Relation: Proceedings of the 9th Australasian Data Mining Conference (AusDM 2011), 51-62. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 121.
- Full Text:
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and instance acquisition effort, considering all the data attributes accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance and redundancy, what ranking does not immediately decide. Additionally, feature ranking methods from different independent sources are called in for the direct comparison.
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and the data acquisition effort, considering all data components being accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree, to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance, what ranking does not immediately decide. Additionally, feature ranking methods available from different independent sources are called in for direct comparison.
- Authors: Bagirov, Adil , Yatsko, Andrew , Stranieri, Andrew
- Date: 2011
- Type: Conference proceedings , Unpublished work
- Relation: Proceedings of the 9th Australasian Data Mining Conference (AusDM 2011), 51-62. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 121.
- Full Text:
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and instance acquisition effort, considering all the data attributes accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance and redundancy, what ranking does not immediately decide. Additionally, feature ranking methods from different independent sources are called in for the direct comparison.
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and the data acquisition effort, considering all data components being accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree, to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance, what ranking does not immediately decide. Additionally, feature ranking methods available from different independent sources are called in for direct comparison.
The seven scam types: Mapping the terrain of cybercrime
- Stabek, Amber, Watters, Paul, Layton, Robert
- Authors: Stabek, Amber , Watters, Paul , Layton, Robert
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Threat of cybercrime is a growing danger to the economy. Industries and businesses are targeted by cyber-criminals along with members of the general public. Since cybercrime is often a symptom of more complex criminological regimes such as laundering, trafficking and terrorism, the true damage caused to society is unknown. Dissimilarities in reporting procedures and non-uniform cybercrime classifications lead international reporting bodies to produce incompatible results which cause difficulties in making valid comparisons. A cybercrime classification framework has been identified as necessary for the development of an inter-jurisdictional, transnational, and global approach to identify, intercept, and prosecute cyber-criminals. Outlined in this paper is a cybercrime classification framework which has been applied to the incidence of scams. Content analysis was performed on over 250 scam descriptions stemming from in excess of 35 scamming categories and over 80 static features derived. Using hierarchical cluster and discriminant function analysis, the sample was reduced from over 35 ambiguous categories into 7 scam types and the top four scamming functions - identified as scamming business processes, revealed. The results of this research bear significant ramifications to the current state of scam and cybercrime classification, research and analysis, as well as offer significant insight into the business processes and applications adopted by scammers and cyber-criminals. © 2010 IEEE.
- Authors: Stabek, Amber , Watters, Paul , Layton, Robert
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Threat of cybercrime is a growing danger to the economy. Industries and businesses are targeted by cyber-criminals along with members of the general public. Since cybercrime is often a symptom of more complex criminological regimes such as laundering, trafficking and terrorism, the true damage caused to society is unknown. Dissimilarities in reporting procedures and non-uniform cybercrime classifications lead international reporting bodies to produce incompatible results which cause difficulties in making valid comparisons. A cybercrime classification framework has been identified as necessary for the development of an inter-jurisdictional, transnational, and global approach to identify, intercept, and prosecute cyber-criminals. Outlined in this paper is a cybercrime classification framework which has been applied to the incidence of scams. Content analysis was performed on over 250 scam descriptions stemming from in excess of 35 scamming categories and over 80 static features derived. Using hierarchical cluster and discriminant function analysis, the sample was reduced from over 35 ambiguous categories into 7 scam types and the top four scamming functions - identified as scamming business processes, revealed. The results of this research bear significant ramifications to the current state of scam and cybercrime classification, research and analysis, as well as offer significant insight into the business processes and applications adopted by scammers and cyber-criminals. © 2010 IEEE.
- «
- ‹
- 1
- ›
- »