Multi-source cyber-attacks detection using machine learning
- Authors: Taheri, Sona , Gondal, Iqbal , Bagirov, Adil , Harkness, Greg , Brown, Simon , Chi, Chihung
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne, Australia; 13th-15th February 2019 Vol. 2019-February, p. 1167-1172
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has significantly increased the number of devices connected to the Internet ranging from sensors to multi-source data information. As the IoT continues to evolve with new technologies number of threats and attacks against IoT devices are on the increase. Analyzing and detecting these attacks originating from different sources needs machine learning models. These models provide proactive solutions for detecting attacks and their sources. In this paper, we propose to apply a supervised machine learning classification technique to identify cyber-attacks from each source. More precisely, we apply the incremental piecewise linear classifier that constructs boundary between sources/classes incrementally starting with one hyperplane and adding more hyperplanes at each iteration. The algorithm terminates when no further significant improvement of the separation of sources/classes is possible. The construction and usage of piecewise linear boundaries allows us to avoid any possible overfitting. We apply the incremental piecewise linear classifier on the multi-source real world cyber security data set to identify cyber-attacks and their sources.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
Nearest neighbor ensembles: An effective method for difficult problems in streaming classification with emerging new classes
- Authors: Cai, Xin-Qiang , Zhao, Peng , Ting, Kai-Ming , Mu, Xin , Jiang, Yuan
- Date: 2019
- Type: Text , Conference proceedings
- Relation: 2019 IEEE International Conference on Data Mining (ICDM) , Beijing, China, 8-11 Nov. 2019 p. 970-975
- Full Text: false
- Reviewed:
- Description: This paper re-examines existing systems in streaming classification with emerging new classes (SENC) problems, where new classes that have not been used to train a classifier may emerge in a data stream. We identify that existing systems have an unspecified assumption that emerging new classes are geometrically far from known classes, or instances of known classes are densely distributed, in the feature space. Using a class separation indicator alpha, we refine the SENC problem into an alpha-SENC problem, where alpha indicates a geometric distance between two classes in the feature space. We show that while most existing systems work well in high-alpha SENC problems (i.e., a new class is geometrically far from a known class or instances of known classes are densely distributed), they perform poorly in low-alpha SENC problems. To solve low-alpha SENC problems effectively, we propose an approach using nearest neighbor ensembles or SENNE. We demonstrate that SENNE is able to handle both the low-alpha and high-alpha SENC problems which can appear at different times in a single data stream.
Analysis of Classifiers for Prediction of Type II Diabetes Mellitus
- Authors: Barhate, Rahul , Kulkarni, Pradnya
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 4th International Conference on Computing, Communication Control and Automation, ICCUBEA 2018
- Full Text:
- Reviewed:
- Description: Diabetes mellitus is a chronic disease and a health challenge worldwide. According to the International Diabetes Federation, 451 million people across the globe have diabetes, with this number anticipated to rise up to 693 million people by 2045. It has been shown that 80% of the complications arising from type II diabetes can be prevented or delayed by early identification of the people who are at risk. Diabetes is difficult to diagnose in the early stages as its symptoms grow subtly and gradually. In a majority of the cases, the patients remain undiagnosed until they are admitted for a heart attack or begin to lose their sight. This paper analyzes the different classification algorithms based on a patient's health history to aid doctors identify the presence of as well as promote early diagnosis and treatment. The experiments were conducted on Pima Indian Diabetes data set. Various classifiers used include K Nearest Neighbors, Logistic Regression, Decision Trees, Random Forest, Gradient Boosting, Support Vector Machine and Neural Network. Results demonstrate that Random Forests performed well on the data set giving an accuracy of 79.7%. © 2018 IEEE.
- Description: E1
Improving deep forest by confidence screening
- Authors: Pang, Ming , Ting, Kaiming , Zhao, Peng , Zhou, Zhi-Hua
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 2018 Ieee International Conference on Data Mining; Singapore, Singapore; 17th-20th November 2018 p. 1194-1199
- Full Text:
- Reviewed:
- Description: Most studies about deep learning are based on neural network models, where many layers of parameterized nonlinear differentiable modules are trained by backpropagation. Recently, it has been shown that deep learning can also be realized by non-differentiable modules without backpropagation training called deep forest. The developed representation learning process is based on a cascade of cascades of decision tree forests, where the high memory requirement and the high time cost inhibit the training of large models. In this paper, we propose a simple yet effective approach to improve the efficiency of deep forest. The key idea is to pass the instances with high confidence directly to the final stage rather than passing through all the levels. We also provide a theoretical analysis suggesting a means to vary the model complexity from low to high as the level increases in the cascade, which further reduces the memory requirement and time cost. Our experiments show that the proposed approach achieves highly competitive predictive performance with significantly reduced time cost and memory requirement by up to one order of magnitude.
Large scale modeling of genetic networks using gene knockout data
- Authors: Youseph, Ahammed , Chetty, Madhu , Karmakar, Gour
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 2018 Australasian Computer Science Week Multiconference, ACSW 2018; Brisbane, Australia; 29th January-2nd February 2018; published in ACM International Conference Proceedings Series
- Full Text: false
- Reviewed:
- Description: Gene regulatory network (GRN) represents a set of genes and their regulatory interactions. The inference of the regulatory interactions between genes is usually carried out as an optimization problem using an appropriate mathematical model and the time-series gene expression data. Among the various models proposed for GRN inference, our recently proposed Michaelis-Menten kinetics based ODE model provides a good trade-off between the computational complexity and biological relevance. This model, like other known GRN models, also uses an evolutionary algorithm for parameter estimation. Since the search space for large networks is huge, leading to a low accuracy of inference, it is important to reduce the search region for improved performance of the optimization algorithm. In this paper, we propose a classification method using gene knockout data to eliminate a large infeasible region from the optimization search area. We also propose a method for partial inference of regulations when all the regulators of a given regulated gene are unregulated genes. The proposed method is evaluated by reconstructing in silico networks of large sizes. © 2018 ACM.
Pixel N-grams for mammographic lesion classification
- Authors: Kulkarni, Pradnya , Stranieri, Andrew , Ugon, Julien , Mittal, Manish , Kulkarni, Siddhivinayak
- Date: 2017
- Type: Text , Conference proceedings
- Relation: 2017 2nd International Conference on Communication Systems, Computing and IT Applications, CSCITA , Mumbai; 7th-8th April, 2017; published in CSCITA 2017 - Proceedings p. 107-111
- Full Text: false
- Reviewed:
- Description: Automated classification algorithms have been applied to breast cancer diagnosis in order to improve the diagnostic accuracy and turnover time. However, classification accuracy, sensitivity and specificity could still be improved further. Moreover, reducing computational cost is another challenge as the number of images to be analyzed is typically large. In this paper, a novel Pixel N-gram approach inspired from character N-grams in the text retrieval context has been applied for mammographic lesion classification. The experiments on real world database demonstrate that the Pixel N-grams outperform the existing histogram as well as Haralick features with respect to classification accuracy as well as sensitivity. Effect of varying N and using various classifiers is also analyzed in this paper. Results show that optimum value of N is equal to 3 and MLP classifier performs better than SVM and KNN classifier using 3-gram features.
Progressive data stream mining and transaction classification for workload-aware incremental database repartitioning
- Authors: Kamal, Joarder , Murshed, Manzur , Gaber, Mohamed
- Date: 2014
- Type: Text , Conference proceedings
- Relation: IEEE/ACM International Symposium on Big Data Computing, BDC 2014; London, United Kingdom; 8th-11th December 2014; p. 8-15
- Full Text:
- Reviewed:
- Description: Minimising the impact of distributed transactions (DTs) in a shared-nothing distributed database is extremely challenging for transactional workloads. With dynamic workload nature and rapid growth in data volume the underlying database requires incremental repartitioning to maintain acceptable level of DTs and data load balance with minimum physical data migrations. In a workload-aware repartitioning scheme transactional workload is modelled as graph or hyper graph, and subsequently perform k-way min-cut clustering guaranteeing minimum edge cuts can reduce the impact of DTs significantly by mapping the workload clusters into logical database partitions. However, without exploring the inherent workload characteristics, the overall processing and computing times for large-scale workload networks increase in polynomial orders. In this paper, a workload-aware incremental database repartitioning technique is proposed, which effectively exploits proactive transaction classification and workload stream mining techniques. Workload batches are modelled in graph, hyper graph, and compressed hyper graph then repartitioned to produce a fresh tuple-to-partition data migration plan for every incremental cycle. Experimental studies in a simulated TPC-C environment demonstrate that the proposed model can be effectively adopted in managing rapid data growth and dynamic workloads, thus progressively reduce the overall processing time required to operate over the workload networks.
Hybrid technique for colour image classification and efficient retrieval based on fuzzy logic and neural networks
- Authors: Fernando, Ranisha , Kulkarni, Siddhivinayak
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Developments in the technology and the Internet have led to increase in number of digital images and videos. Thousands of images are added to WWW every day. To retrieve the specific images efficiently from database or from Internet is becoming a challenge now a day. As a result, the necessity of retrieving images has emerged to be important to various professional areas. This paper proposes a novel fuzzy approach to classify the colour images based on their content, to pose a query in terms of natural language and fuse the queries based on neural networks for fast and efficient retrieval. Number of experiments was conducted for classification and retrieval of images on sets of images and promising results were obtained. The results were analysed and compared with other similar image retrieval system. © 2012 IEEE.
A novel hybrid neural learning algorithm using simulated annealing and quasisecant method
- Authors: Yearwood, John , Bagirov, Adil , Seifollahi, Sattar
- Date: 2011
- Type: Text , Conference proceedings
- Full Text: false
- Description: In this paper, we propose a hybrid learning algorithm for the single hidden layer feedforward neural networks (SLFNs) for data classification. The proposed hybrid algorithm is a two-phase learning algorithm and is based on the quasisecant and the simulated annealing methods. First, the weights between the hidden layer and the output layer nodes (output layer weights) are adjusted by the quasisecant algorithm. Then the simulated annealing is applied for global attribute weighting. The weights between the input layer and the hidden layer nodes are fixed in advance and are not included in the learning process. The proposed two-phase learning of the network is a novel idea and is different from that of the existing ones. The numerical results on some benchmark data sets are also reported and these results are promising. © 2011, Australian Computer Society, Inc.
- Description: 2003009507
Application of SVM in citation information extraction
- Authors: Liang, Jiguang , Layton, Robert , Wang, Wei
- Date: 2011
- Type: Text , Conference proceedings
- Full Text: false
- Description: Support Vector Machines are an effective form of binary-class classification algorithm. To enhance the utilization of text structural features for information extraction, which are greatly restricted by the Hidden Markov Model (HMM), this paper proposes a support vector machine multi-class classification based on Markov properties to extract the information from a citation database. The proposed model extracts symbol characteristics as features and composes a binary tree of the transition probabilities. Experiments show that the proposed method outperforms HMM and basic SVM methods. © 2011 IEEE.
Feature selection using misclassification counts
- Authors: Bagirov, Adil , Yatsko, Andrew , Stranieri, Andrew
- Date: 2011
- Type: Conference proceedings , Unpublished work
- Relation: Proceedings of the 9th Australasian Data Mining Conference (AusDM 2011), 51-62. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 121.
- Full Text:
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and instance acquisition effort, considering all the data attributes accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance and redundancy, what ranking does not immediately decide. Additionally, feature ranking methods from different independent sources are called in for the direct comparison.
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and the data acquisition effort, considering all data components being accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree, to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance, what ranking does not immediately decide. Additionally, feature ranking methods available from different independent sources are called in for direct comparison.
The seven scam types: Mapping the terrain of cybercrime
- Authors: Stabek, Amber , Watters, Paul , Layton, Robert
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Threat of cybercrime is a growing danger to the economy. Industries and businesses are targeted by cyber-criminals along with members of the general public. Since cybercrime is often a symptom of more complex criminological regimes such as laundering, trafficking and terrorism, the true damage caused to society is unknown. Dissimilarities in reporting procedures and non-uniform cybercrime classifications lead international reporting bodies to produce incompatible results which cause difficulties in making valid comparisons. A cybercrime classification framework has been identified as necessary for the development of an inter-jurisdictional, transnational, and global approach to identify, intercept, and prosecute cyber-criminals. Outlined in this paper is a cybercrime classification framework which has been applied to the incidence of scams. Content analysis was performed on over 250 scam descriptions stemming from in excess of 35 scamming categories and over 80 static features derived. Using hierarchical cluster and discriminant function analysis, the sample was reduced from over 35 ambiguous categories into 7 scam types and the top four scamming functions - identified as scamming business processes, revealed. The results of this research bear significant ramifications to the current state of scam and cybercrime classification, research and analysis, as well as offer significant insight into the business processes and applications adopted by scammers and cyber-criminals. © 2010 IEEE.