Application of rank correlation, clustering and classification in information security
- Beliakov, Gleb, Yearwood, John, Kelarev, Andrei
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2012
- Type: Text , Journal article
- Relation: Journal of Networks Vol. 7, no. 6 (2012), p. 935-945
- Full Text:
- Reviewed:
- Description: This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman-Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms. © 2012 Academy Publisher.
- Description: 2003010277
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2012
- Type: Text , Journal article
- Relation: Journal of Networks Vol. 7, no. 6 (2012), p. 935-945
- Full Text:
- Reviewed:
- Description: This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman-Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms. © 2012 Academy Publisher.
- Description: 2003010277
Profiling phishing emails based on hyperlink information
- Yearwood, John, Mammadov, Musa, Banerjee, Arunava
- Authors: Yearwood, John , Mammadov, Musa , Banerjee, Arunava
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 2010 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2010, Odense : 9th-11th August 2010 p. 120-127
- Full Text:
- Description: In this paper, a novel method for profiling phishing activity from an analysis of phishing emails is proposed. Profiling is useful in determining the activity of an individual or a particular group of phishers. Work in the area of phishing is usually aimed at detection of phishing emails. In this paper, we concentrate on profiling as distinct from detection of phishing emails. We formulate the profiling problem as a multi-label classification problem using the hyperlinks in the phishing emails as features and structural properties of emails along with whois (i.e.DNS) information on hyperlinks as profile classes. Further, we generate profiles based on classifier predictions. Thus, classes become elements of profiles. We employ a boosting algorithm (AdaBoost) as well as SVM to generate multi-label class predictions on three different datasets created from hyperlink information in phishing emails. These predictions are further utilized to generate complete profiles of these emails. Results show that profiling can be done with quite high accuracy using hyperlink information. © 2010 Crown Copyright.
- Authors: Yearwood, John , Mammadov, Musa , Banerjee, Arunava
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 2010 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2010, Odense : 9th-11th August 2010 p. 120-127
- Full Text:
- Description: In this paper, a novel method for profiling phishing activity from an analysis of phishing emails is proposed. Profiling is useful in determining the activity of an individual or a particular group of phishers. Work in the area of phishing is usually aimed at detection of phishing emails. In this paper, we concentrate on profiling as distinct from detection of phishing emails. We formulate the profiling problem as a multi-label classification problem using the hyperlinks in the phishing emails as features and structural properties of emails along with whois (i.e.DNS) information on hyperlinks as profile classes. Further, we generate profiles based on classifier predictions. Thus, classes become elements of profiles. We employ a boosting algorithm (AdaBoost) as well as SVM to generate multi-label class predictions on three different datasets created from hyperlink information in phishing emails. These predictions are further utilized to generate complete profiles of these emails. Results show that profiling can be done with quite high accuracy using hyperlink information. © 2010 Crown Copyright.
Understanding victims of identity theft: Preliminary insights
- Turville, Kylie, Yearwood, John, Miller, Charlynn
- Authors: Turville, Kylie , Yearwood, John , Miller, Charlynn
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Identity theft is not a new crime, however changes in society and the way that business is conducted have made it an easier, attractive and more lucrative crime. When a victim discovers the misuse of their identity they must then begin the process of recovery, including fixing any issues that may have been created by the misuse. For some victims this may only take a small amount of time and effort, however for others they may continue to experience issues for many years after the initial moment of discovery. To date, little research has been conducted within Australia or internationally regarding what a victim experiences as they work through the recovery process. This paper presents a summary of the identity theft domain with an emphasis on research conducted within Australia, and identifies a number of issues regarding research in this area. The paper also provides an overview of the research project currently being undertaken by the authors in obtaining an understanding of what victims of identity theft experience during the recovery process; particularly their experiences when dealing with organizations. Finally, it reports on some of the preliminary work that has already been conducted for the research project. © 2010 IEEE.
- Authors: Turville, Kylie , Yearwood, John , Miller, Charlynn
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Identity theft is not a new crime, however changes in society and the way that business is conducted have made it an easier, attractive and more lucrative crime. When a victim discovers the misuse of their identity they must then begin the process of recovery, including fixing any issues that may have been created by the misuse. For some victims this may only take a small amount of time and effort, however for others they may continue to experience issues for many years after the initial moment of discovery. To date, little research has been conducted within Australia or internationally regarding what a victim experiences as they work through the recovery process. This paper presents a summary of the identity theft domain with an emphasis on research conducted within Australia, and identifies a number of issues regarding research in this area. The paper also provides an overview of the research project currently being undertaken by the authors in obtaining an understanding of what victims of identity theft experience during the recovery process; particularly their experiences when dealing with organizations. Finally, it reports on some of the preliminary work that has already been conducted for the research project. © 2010 IEEE.
- «
- ‹
- 1
- ›
- »