- Title
- Optimization based clustering algorithms for authorship analysis of phishing emails
- Creator
- Seifollahi, Sattar; Bagirov, Adil; Layton, Robert; Gondal, Iqbal
- Date
- 2017
- Type
- Text; Journal article
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/160491
- Identifier
- vital:12192
- Identifier
-
https://doi.org/10.1007/s11063-017-9593-7
- Identifier
- ISSN:1370-4621
- Abstract
- Phishing has given attackers power to masquerade as legitimate users of organizations, such as banks, to scam money and private information from victims. Phishing is so widespread that combating the phishing attacks could overwhelm the victim organization. It is important to group the phishing attacks to formulate effective defence mechanism. In this paper, we use clustering methods to analyze and characterize phishing emails and perform their relative attribution. Emails are first tokenized to a bag-of-word space and, then, transformed to a numeric vector space using frequencies of words in documents. Wordnet vocabulary is used to take effects of similar words into account and to reduce sparsity. The word similarity measure is combined with the term frequencies to introduce a novel text transformation into numeric features. To improve the accuracy, we apply inverse document frequency weighting, which gives higher weights to features used by fewer authors. The k-means and recently introduced three optimization based algorithms: MS-MGKM, INCA and DCClust are applied for clustering purposes. The optimization based algorithms indicate the existence of well separated clusters in the phishing emails dataset. © 2017, Springer Science+Business Media New York.
- Publisher
- Springer New York LLC
- Relation
- Neural Processing Letters Vol. 46, no. 2 (2017), p. 411-425; http://purl.org/au-research/grants/arc/DP140103213
- Rights
- Copyright © 2017, Springer Science+Business Media New York.
- Rights
- This metadata is freely available under a CCO license
- Subject
- 0801 Artificial Intelligence and Image Processing; 1702 Cognitive Science; Authorship analysis; Clustering technique; Global optimization
- Reviewed
- Hits: 3496
- Visitors: 3201
- Downloads: 2
Thumbnail | File | Description | Size | Format |
---|