- Title
- Application of rank correlation, clustering and classification in information security
- Creator
- Beliakov, Gleb; Yearwood, John; Kelarev, Andrei
- Date
- 2012
- Type
- Text; Journal article
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/68711
- Identifier
- vital:4708
- Identifier
-
https://doi.org/10.4304/jnw.7.6.935-945
- Identifier
- ISSN:1796-2056
- Abstract
- This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman-Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms. © 2012 Academy Publisher.
- Relation
- Journal of Networks Vol. 7, no. 6 (2012), p. 935-945
- Rights
- Copyright Academy Publisher.
- Rights
- Open Access
- Rights
- This metadata is freely available under a CCO license
- Subject
- 0805 Distributed Computing; Classification; Clustering; Consensus functions; Phishing websites; Clustering techniques; Clusterings; Consensus clustering; Correlation coefficient; Dimensionality reduction; Experimental investigations; Large data; Large datasets; Linear correlation coefficient; Novel applications; Number of clusters; Precision and recall; Rank correlation; Real time; Spearman rank correlation; Supervised classification; Clustering algorithms; Computer crime; Information dissemination; Intrusion detection; Websites
- Full Text
- Reviewed
- Hits: 18359
- Visitors: 18480
- Downloads: 388
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | SOURCE1 | Submitted Version | 226 KB | Adobe Acrobat PDF | View Details Download |