Zero-day malware detection based on supervised learning algorithms of API call signatures
- Authors: Alazab, Mamoun , Venkatraman, Sitalakshmi , Watters, Paul , Alazab, Moutaz
- Date: 2011
- Type: Text , Conference proceedings
- Full Text:
- Description: Zero-day or unknown malware are created using code obfuscation techniques that can modify the parent code to produce offspring copies which have the same functionality but with different signatures. Current techniques reported in literature lack the capability of detecting zero-day malware with the required accuracy and efficiency. In this paper, we have proposed and evaluated a novel method of employing several data mining techniques to detect and classify zero-day malware with high levels of accuracy and efficiency based on the frequency of Windows API calls. This paper describes the methodology employed for the collection of large data sets to train the classifiers, and analyses the performance results of the various data mining algorithms adopted for the study using a fully automated tool developed in this research to conduct the various experimental investigations and evaluation. Through the performance results of these algorithms from our experimental analysis, we are able to evaluate and discuss the advantages of one data mining algorithm over the other for accurately detecting zero-day malware successfully. The data mining framework employed in this research learns through analysing the behavior of existing malicious and benign codes in large datasets. We have employed robust classifiers, namely NaÃ¯ve Bayes (NB) Algorithm, k-Nearest Neighbor (kNN) Algorithm, Sequential Minimal Optimization (SMO) Algorithm with 4 differents kernels (SMO - Normalized PolyKernel, SMO - PolyKernel, SMO - Puk, and SMO- Radial Basis Function (RBF)), Backpropagation Neural Networks Algorithm, and J48 decision tree and have evaluated their performance. Overall, the automated data mining system implemented for this study has achieved high true positive (TP) rate of more than 98.5%, and low false positive (FP) rate of less than 0.025, which has not been achieved in literature so far. This is much higher than the required commercial acceptance level indicating that our novel technique is a major leap forward in detecting zero-day malware. This paper also offers future directions for researchers in exploring different aspects of obfuscations that are affecting the IT world today. © 2011, Australian Computer Society, Inc.
- Description: 2003009506
Application of rank correlation, clustering and classification in information security
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2012
- Type: Text , Journal article
- Relation: Journal of Networks Vol. 7, no. 6 (2012), p. 935-945
- Full Text:
- Description: This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman-Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms. © 2012 Academy Publisher.
- Description: 2003010277