Unsupervised authorship analysis of phishing webpages
- Layton, Robert, Watters, Paul, Dazeley, Richard
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Authorship analysis on phishing websites enables the investigation of phishing attacks, beyond basic analysis. In authorship analysis, salient features from documents are used to determine properties about the author, such as which of a set of candidate authors wrote a given document. In unsupervised authorship analysis, the aim is to group documents such that all documents by one author are grouped together. Applying this to cyber-attacks shows the size and scope of attacks from specific groups. This in turn allows investigators to focus their attention on specific attacking groups rather than trying to profile multiple independent attackers. In this paper, we analyse phishing websites using the current state of the art unsupervised authorship analysis method, called NUANCE. The results indicate that the application produces clusters which correlate strongly to authorship, evaluated using expert knowledge and external information as well as showing an improvement over a previous approach with known flaws. © 2012 IEEE.
- Description: 2003010678
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Authorship analysis on phishing websites enables the investigation of phishing attacks, beyond basic analysis. In authorship analysis, salient features from documents are used to determine properties about the author, such as which of a set of candidate authors wrote a given document. In unsupervised authorship analysis, the aim is to group documents such that all documents by one author are grouped together. Applying this to cyber-attacks shows the size and scope of attacks from specific groups. This in turn allows investigators to focus their attention on specific attacking groups rather than trying to profile multiple independent attackers. In this paper, we analyse phishing websites using the current state of the art unsupervised authorship analysis method, called NUANCE. The results indicate that the application produces clusters which correlate strongly to authorship, evaluated using expert knowledge and external information as well as showing an improvement over a previous approach with known flaws. © 2012 IEEE.
- Description: 2003010678
Authorship attribution for Twitter in 140 characters or less
- Layton, Robert, Watters, Paul, Dazeley, Richard
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at - 2nd Cybercrime and Trustworthy Computing Workshop, CTC 2010 p. 1-8
- Full Text:
- Reviewed:
- Description: Authorship attribution is a growing field, moving from beginnings in linguistics to recent advances in text mining. Through this change came an increase in the capability of authorship attribution methods both in their accuracy and the ability to consider more difficult problems. Research into authorship attribution in the 19th century considered it difficult to determine the authorship of a document of fewer than 1000 words. By the 1990s this values had decreased to less than 500 words and in the early 21 st century it was considered possible to determine the authorship of a document in 250 words. The need for this ever decreasing limit is exemplified by the trend towards many shorter communications rather than fewer longer communications, such as the move from traditional multi-page handwritten letters to shorter, more focused emails. This trend has also been shown in online crime, where many attacks such as phishing or bullying are performed using very concise language. Cybercrime messages have long been hosted on Internet Relay Chats (IRCs) which have allowed members to hide behind screen names and connect anonymously. More recently, Twitter and other short message based web services have been used as a hosting ground for online crimes. This paper presents some evaluations of current techniques and identifies some new preprocessing methods that can be used to enable authorship to be determined at rates significantly better than chance for documents of 140 characters or less, a format popularised by the micro-blogging website Twitter1. We show that the SCAP methodology performs extremely well on twitter messages and even with restrictions on the types of information allowed, such as the recipient of directed messages, still perform significantly higher than chance. Further to this, we show that 120 tweets per user is an important threshold, at which point adding more tweets per user gives a small but non-significant increase in accuracy. © 2010 IEEE.
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at - 2nd Cybercrime and Trustworthy Computing Workshop, CTC 2010 p. 1-8
- Full Text:
- Reviewed:
- Description: Authorship attribution is a growing field, moving from beginnings in linguistics to recent advances in text mining. Through this change came an increase in the capability of authorship attribution methods both in their accuracy and the ability to consider more difficult problems. Research into authorship attribution in the 19th century considered it difficult to determine the authorship of a document of fewer than 1000 words. By the 1990s this values had decreased to less than 500 words and in the early 21 st century it was considered possible to determine the authorship of a document in 250 words. The need for this ever decreasing limit is exemplified by the trend towards many shorter communications rather than fewer longer communications, such as the move from traditional multi-page handwritten letters to shorter, more focused emails. This trend has also been shown in online crime, where many attacks such as phishing or bullying are performed using very concise language. Cybercrime messages have long been hosted on Internet Relay Chats (IRCs) which have allowed members to hide behind screen names and connect anonymously. More recently, Twitter and other short message based web services have been used as a hosting ground for online crimes. This paper presents some evaluations of current techniques and identifies some new preprocessing methods that can be used to enable authorship to be determined at rates significantly better than chance for documents of 140 characters or less, a format popularised by the micro-blogging website Twitter1. We show that the SCAP methodology performs extremely well on twitter messages and even with restrictions on the types of information allowed, such as the recipient of directed messages, still perform significantly higher than chance. Further to this, we show that 120 tweets per user is an important threshold, at which point adding more tweets per user gives a small but non-significant increase in accuracy. © 2010 IEEE.
Automatically generating classifier for phishing email prediction
- Ma, Liping, Torney, Rosemary, Watters, Paul, Brown, Simon
- Authors: Ma, Liping , Torney, Rosemary , Watters, Paul , Brown, Simon
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at I-SPAN 2009 - The 10th International Symposium on Pervasive Systems, Algorithms, and Networks, Kaohsiung, Taiwan : 14th-16th December 2009 p. 779-783
- Full Text:
- Description: Phishing is a form of online identity theft that employs both social engineering and technical subterfuge to steal consumers' personal identity data and financial account credentials. Phishing email prediction has drawn a lot of attention from many researchers. According to current anti-phishing research, a classifier generated by decision tree produces the most accurate predictions. However, there appears not to be any open source available to transfer such a decision to an implementable classifier. The work presented in this paper builds a decision tree parser which automatically translates a decision tree into an implementable program language so that the decision is useful in real world applications. Experiment results show that the parser performs as well as the original decision. © 2009 IEEE.
- Description: 2003007989
- Authors: Ma, Liping , Torney, Rosemary , Watters, Paul , Brown, Simon
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at I-SPAN 2009 - The 10th International Symposium on Pervasive Systems, Algorithms, and Networks, Kaohsiung, Taiwan : 14th-16th December 2009 p. 779-783
- Full Text:
- Description: Phishing is a form of online identity theft that employs both social engineering and technical subterfuge to steal consumers' personal identity data and financial account credentials. Phishing email prediction has drawn a lot of attention from many researchers. According to current anti-phishing research, a classifier generated by decision tree produces the most accurate predictions. However, there appears not to be any open source available to transfer such a decision to an implementable classifier. The work presented in this paper builds a decision tree parser which automatically translates a decision tree into an implementable program language so that the decision is useful in real world applications. Experiment results show that the parser performs as well as the original decision. © 2009 IEEE.
- Description: 2003007989
Determining provenance in phishing websites using automated conceptual analysis
- Layton, Robert, Watters, Paul
- Authors: Layton, Robert , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 eCrime Researchers Summit, eCRIME '09, Tacoma, Washington : 20th-21st October 2009 p. 1-7
- Full Text:
- Description: Phishing is a form of online fraud with drastic consequences for the victims and institutions being defrauded. A phishing attack tries to create a believable environment for the intended victim to enter their confidential data such that the attacker can use or sell this information later. In order to apprehend phishers, law enforcement agencies need automated systems capable of tracking the size and scope of phishing attacks, in order to more wisely use their resources shutting down the major players, rather then wasting resources stopping smaller operations. In order to develop these systems, phishing attacks need to be clustered by provenance in a way that adequately profiles these evolving attackers. The research presented in this paper looks at the viability of using automated conceptual analysis through cluster analysis techniques on phishing websites, with the aim of determining provenance of these phishing attacks. Conceptual analysis is performed on the source code of the websites, rather than the final text that is displayed to the user, eliminating problems with rendering obfuscation and increasing the distinctiveness brought about by differences in coding styles of the phishers. By using cluster analysis algorithms, distinguishing factors between groups of phishing websites can be obtained. The results indicate that it is difficult to separate websites by provenance without also separating by intent, by looking at the phishing websites alone. Instead, the methods discussed in this paper should form part of a larger system that uses more information about the phishing attacks.
- Authors: Layton, Robert , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 eCrime Researchers Summit, eCRIME '09, Tacoma, Washington : 20th-21st October 2009 p. 1-7
- Full Text:
- Description: Phishing is a form of online fraud with drastic consequences for the victims and institutions being defrauded. A phishing attack tries to create a believable environment for the intended victim to enter their confidential data such that the attacker can use or sell this information later. In order to apprehend phishers, law enforcement agencies need automated systems capable of tracking the size and scope of phishing attacks, in order to more wisely use their resources shutting down the major players, rather then wasting resources stopping smaller operations. In order to develop these systems, phishing attacks need to be clustered by provenance in a way that adequately profiles these evolving attackers. The research presented in this paper looks at the viability of using automated conceptual analysis through cluster analysis techniques on phishing websites, with the aim of determining provenance of these phishing attacks. Conceptual analysis is performed on the source code of the websites, rather than the final text that is displayed to the user, eliminating problems with rendering obfuscation and increasing the distinctiveness brought about by differences in coding styles of the phishers. By using cluster analysis algorithms, distinguishing factors between groups of phishing websites can be obtained. The results indicate that it is difficult to separate websites by provenance without also separating by intent, by looking at the phishing websites alone. Instead, the methods discussed in this paper should form part of a larger system that uses more information about the phishing attacks.
Why do users trust the wrong messages? A behavioural model of phishing
- Authors: Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 eCrime Researchers Summit, eCRIME '09, Tacoma, Washington : 20th-21st October 2009 p. 1-7
- Full Text:
- Description: Given the rise of phishing over the past 5 years, a recurring question is why users continue to fall for these scams? Various technical countermeasures have been proposed to try and counter phishing, and none have yet comprehensively succeeded in preventing users from becoming victims. This paper argues that an explicit model of user psychology is required to understand user behaviour in (a) processing phishing e-mails, (b) clicking on links to phishing websites, and (c) interacting with these websites. Many users engage in e-mail and web activity with an inappropriately high level of trust: users are constantly rewarded by their online interactions, even where there is a low level of formalised trust between the sending and receiving parties, eg, if an e-mail claims to be sent from a bank, then it must be so, even if there has been no a priori exchange of credentials mediated by a trusted third party. Previously, mathematical models have been developed to predict trust established and maintenance based on reputation scores (e.g., Tran et al [1, 2]). This paper considers two inter-related questions: (a) can we model the behaviour of users learning to trust, based on non-associative models of learning (habituation and sensitisation), and (b) can we then locate this behavioural activity in a broader psychological model with a view to identifying potential countermeasures which might circumvent learned behaviour? © 2009 Crown.
- Description: Given the rise of phishing over the past 5 years, a recurring question is why users continue to fall for these scams? Various technical countermeasures have been proposed to try and counter phishing, and none have yet comprehensively succeeded in preventing users from becoming victims. This paper argues that an explicit model of user psychology is required to understand user behaviour in (a) processing phishing e-mails, (b) clicking on links to phishing websites, and (c) interacting with these websites. Many users engage in e-mail and web activity with an inappropriately high level of trust: users are constantly rewarded by their online interactions, even where there is a low level of formalised trust between the sending and receiving parties, eg, if an e-mail claims to be sent from a bank, then it must be so, even if there has been no a priori exchange of credentials mediated by a trusted third party. Previously, mathematical models have been developed to predict trust established and maintenance based on reputation scores (e.g., Tran et al [1, 2]). This paper considers two inter-related questions: (a) can we model the behaviour of users learning to trust, based on non-associative models of learning (habituation and sensitisation), and (b) can we then locate this behavioural activity in a broader psychological model with a view to identifying potential countermeasures which might circumvent learned behaviour? © 2009 Crown.
- Authors: Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 eCrime Researchers Summit, eCRIME '09, Tacoma, Washington : 20th-21st October 2009 p. 1-7
- Full Text:
- Description: Given the rise of phishing over the past 5 years, a recurring question is why users continue to fall for these scams? Various technical countermeasures have been proposed to try and counter phishing, and none have yet comprehensively succeeded in preventing users from becoming victims. This paper argues that an explicit model of user psychology is required to understand user behaviour in (a) processing phishing e-mails, (b) clicking on links to phishing websites, and (c) interacting with these websites. Many users engage in e-mail and web activity with an inappropriately high level of trust: users are constantly rewarded by their online interactions, even where there is a low level of formalised trust between the sending and receiving parties, eg, if an e-mail claims to be sent from a bank, then it must be so, even if there has been no a priori exchange of credentials mediated by a trusted third party. Previously, mathematical models have been developed to predict trust established and maintenance based on reputation scores (e.g., Tran et al [1, 2]). This paper considers two inter-related questions: (a) can we model the behaviour of users learning to trust, based on non-associative models of learning (habituation and sensitisation), and (b) can we then locate this behavioural activity in a broader psychological model with a view to identifying potential countermeasures which might circumvent learned behaviour? © 2009 Crown.
- Description: Given the rise of phishing over the past 5 years, a recurring question is why users continue to fall for these scams? Various technical countermeasures have been proposed to try and counter phishing, and none have yet comprehensively succeeded in preventing users from becoming victims. This paper argues that an explicit model of user psychology is required to understand user behaviour in (a) processing phishing e-mails, (b) clicking on links to phishing websites, and (c) interacting with these websites. Many users engage in e-mail and web activity with an inappropriately high level of trust: users are constantly rewarded by their online interactions, even where there is a low level of formalised trust between the sending and receiving parties, eg, if an e-mail claims to be sent from a bank, then it must be so, even if there has been no a priori exchange of credentials mediated by a trusted third party. Previously, mathematical models have been developed to predict trust established and maintenance based on reputation scores (e.g., Tran et al [1, 2]). This paper considers two inter-related questions: (a) can we model the behaviour of users learning to trust, based on non-associative models of learning (habituation and sensitisation), and (b) can we then locate this behavioural activity in a broader psychological model with a view to identifying potential countermeasures which might circumvent learned behaviour? © 2009 Crown.
- «
- ‹
- 1
- ›
- »