An application of novel clustering technique for information security
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2011
- Type: Text , Conference paper
- Relation: Applications and Techniques in Information Security Workshop p. 5-11
- Full Text: false
- Reviewed:
- Description: This article presents experimental results devoted to a new application of the novel clustering technique introduced by the authors recently. Our aim is to facilitate the application of robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on the particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, we use a consensus function to combine these independent clusterings into one consensus clustering . Feature ranking is used to select a subset of features for the consensus function. Third, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of three consensus functions, Cluster-Based Graph Formulation (CBGF), Hybrid Bipartite Graph Formulation (HBGF), and Instance-Based Graph Formulation (IBGF) and a variety of supervised classification algorithms. The best precision and recall have been obtained by the combination of the HBGF consensus function and the SMO classifier with the polynomial kernel.
- Description: 2003009195
Establishing phishing provenance using orthographic features
- Authors: Liping, Ma , Yearwood, John , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 eCrime Researchers Summit, eCRIME '09, Tacoma, Washington : 20th-21st October 2009
- Full Text:
- Description: After phishing message detection, determining the provenance of phishing messages and Websites is the second step to tracing cybercriminals. In this paper, we present a novel method to cluster phishing emails automatically using orthographic features. In particular, we develop an algorithm to cluster documents and remove redundant features at the same time. After collecting all the possible features based on observation, we adapt the modified global k-mean method repeatedly, and generate the objective function values over a range of tolerance values across different subsets of features. Finally, we identify the appropriate clusters based on studying the distribution of the objective function values. Experimental evaluation of a large number of computations demonstrates that our clustering and feature selection techniques are highly effective and achieve reliable results.
- Description: 2003007842
A CAD system using clustering and novel feature extraction technique
- Authors: Ghosh, Ranadhir , Ghosh, Moumita , Yearwood, John
- Date: 2005
- Type: Text , Conference paper
- Relation: Paper presented at CISTM 2005, Gurgaon, India : 24th - 26th July, 2005
- Full Text: false
- Reviewed:
- Description: Many previous efforts have utilized many different approaches for recognition in breast cancer detection using various ANN classifier-modelling techniques. Most of the previous work was concentred mostly on the classification of the damaged areas with the help of doctor’s suggestion. Doctors use to mark the suspicious areas area in the mammogram and the classifier only extract those marked areas and tries to classify it. An intelligent automatic diagnosis system can be very helpful for radiologist in diagnosing Breast cancer. In this research we are applying a local search gradient free clustering algorithm to find out the suspicious / damaged area. We compare our results with the doctor’s marking. Also it has been observed that, beyond a certain point, the inclusion of additional features leads to a worse rather than better performance. Moreover, the choice of features to represent the patterns affects several aspects of pattern recognition problems such as accuracy, required learning time and a necessary number of samples. A common problem with the multi-category feature classification is the conflict between the categories. None of the feasible solutions allow simultaneous optimal solution for all categories. In order to find an optimal solution the search space can be divided based on an individual category in each sub region and finally merging them through decision spport system. Combining the feature selection with the classifier has been a major challenge for the researchers. A similar technique employed in both the levels often worsens their performance. Some preliminary studies has revealed that while using traditional canonical GA has been a good choice for feature selection modules, however under perform for the classifier level module. An evolutionary based algorithm for the classifier level provides a much better solution for this purpose. In this paper we propose a hybrid canonical based feature extraction technique with a combination of evolutionary algorithm based classifier using a feed forward MLP model.
- Description: E1
- Description: 2003001369
A hybrid clustering algorithm using two level of abstraction
- Authors: Ghosh, Ranadhir , Mammadov, Musa , Ghosh, Moumita , Yearwood, John
- Date: 2005
- Type: Text , Conference paper
- Relation: Paper presented at Fuzzy Logic, Soft Computing, and Computational Intelligence, 11th International Fuzzy Systems Association World Congress, Beijing, China : 28th - 31st July, 2005
- Full Text: false
- Reviewed:
- Description: E1
- Description: 2003001360
An experiment in task decomposition and ensembling for a modular artificial neural network
- Authors: Ferguson, Brent , Ghosh, Ranadhir , Yearwood, John
- Date: 2004
- Type: Text , Conference paper
- Relation: Paper presented at Innovations in Applied Artificial Intelligence: 17th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, Ottawa, Canada : 17th May, 2004
- Full Text:
- Reviewed:
- Description: Modular neural networks have the possibility of overcoming common scalability and interference problems experienced by fully connected neural networks when applied to large databases. In this paper we trial an approach to constructing modular ANN's for a very large problem from CEDAR for the classification of handwritten characters. In our approach, we apply progressive task decomposition methods based upon clustering and regression techniques to find modules. We then test methods for combining the modules into ensembles and compare their structural characteristics and classification performance with that of an ANN having a fully connected topology. The results reveal improvements to classification rates as well as network topologies for this problem.
- Description: E1
- Description: 2003000852
Two level clustering using SOM and dynamical systems
- Authors: Ghosh, Ranadhir , Mammadov, Musa , Ghosh, Moumita , Yearwood, John
- Date: 2004
- Type: Text , Conference paper
- Relation: Paper presented at ICOTA6: 6th International Conference on Optimization - Techniques and Applications, Ballarat, Victoria : 9th December, 2004
- Full Text: false
- Reviewed:
- Description: E1
- Description: 2003000871