Application of rank correlation, clustering and classification in information security
- Beliakov, Gleb, Yearwood, John, Kelarev, Andrei
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2012
- Type: Text , Journal article
- Relation: Journal of Networks Vol. 7, no. 6 (2012), p. 935-945
- Full Text:
- Reviewed:
- Description: This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman-Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms. © 2012 Academy Publisher.
- Description: 2003010277
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2012
- Type: Text , Journal article
- Relation: Journal of Networks Vol. 7, no. 6 (2012), p. 935-945
- Full Text:
- Reviewed:
- Description: This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman-Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms. © 2012 Academy Publisher.
- Description: 2003010277
New traceability codes and identification algorithm for tracing pirates
- Wu, Xinwen, Watters, Paul, Yearwood, John
- Authors: Wu, Xinwen , Watters, Paul , Yearwood, John
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at 2008 International Symposium on Parallel and Distributed Processing with Applications, ISPA 2008, Sydney, New South Wales : 10th-12th December 2008 p. 719-724
- Full Text:
- Description: With the increasing popularity of digital products, there is a strong desire to protect the rights of owners against illegal redistribution. Traditional encryption schemes alone do not provide a comprehensive solution to digital rights management, since they do not prevent users who are authorized to use a digital product for their own use from transferring the cleartext content to unauthorized users. However, traceability schemes can be used to trace the illegitimate redistributors effectively. Two types of traceability schemes have been proposed in the literature - traceability codes (TA codes), and codes with the identifiable parent properties (IPP codes). TA codes are special IPP codes, and many TA codes implement an efficient identification algorithm which can determine at least one redistributor. However, many IPP codes are not TA codes, in which case, no efficient identification algorithms are available. In this paper, we generalize the definition of TA codes to derive a new family of traceability codes that is much larger than the family of traditional TA codes. By using existing decoding algorithms with respect to the Lee distance, an efficient identification algorithm is proposed for generalized TA codes. Furthermore, we show that the identification algorithm of generalized TA codes can find more redistributors than those of traditional TA codes.
- Description: 2003006288
- Authors: Wu, Xinwen , Watters, Paul , Yearwood, John
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at 2008 International Symposium on Parallel and Distributed Processing with Applications, ISPA 2008, Sydney, New South Wales : 10th-12th December 2008 p. 719-724
- Full Text:
- Description: With the increasing popularity of digital products, there is a strong desire to protect the rights of owners against illegal redistribution. Traditional encryption schemes alone do not provide a comprehensive solution to digital rights management, since they do not prevent users who are authorized to use a digital product for their own use from transferring the cleartext content to unauthorized users. However, traceability schemes can be used to trace the illegitimate redistributors effectively. Two types of traceability schemes have been proposed in the literature - traceability codes (TA codes), and codes with the identifiable parent properties (IPP codes). TA codes are special IPP codes, and many TA codes implement an efficient identification algorithm which can determine at least one redistributor. However, many IPP codes are not TA codes, in which case, no efficient identification algorithms are available. In this paper, we generalize the definition of TA codes to derive a new family of traceability codes that is much larger than the family of traditional TA codes. By using existing decoding algorithms with respect to the Lee distance, an efficient identification algorithm is proposed for generalized TA codes. Furthermore, we show that the identification algorithm of generalized TA codes can find more redistributors than those of traditional TA codes.
- Description: 2003006288
- «
- ‹
- 1
- ›
- »