Authorship attribution for Twitter in 140 characters or less
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at - 2nd Cybercrime and Trustworthy Computing Workshop, CTC 2010 p. 1-8
- Full Text:
- Reviewed:
- Description: Authorship attribution is a growing field, moving from beginnings in linguistics to recent advances in text mining. Through this change came an increase in the capability of authorship attribution methods both in their accuracy and the ability to consider more difficult problems. Research into authorship attribution in the 19th century considered it difficult to determine the authorship of a document of fewer than 1000 words. By the 1990s this values had decreased to less than 500 words and in the early 21 st century it was considered possible to determine the authorship of a document in 250 words. The need for this ever decreasing limit is exemplified by the trend towards many shorter communications rather than fewer longer communications, such as the move from traditional multi-page handwritten letters to shorter, more focused emails. This trend has also been shown in online crime, where many attacks such as phishing or bullying are performed using very concise language. Cybercrime messages have long been hosted on Internet Relay Chats (IRCs) which have allowed members to hide behind screen names and connect anonymously. More recently, Twitter and other short message based web services have been used as a hosting ground for online crimes. This paper presents some evaluations of current techniques and identifies some new preprocessing methods that can be used to enable authorship to be determined at rates significantly better than chance for documents of 140 characters or less, a format popularised by the micro-blogging website Twitter1. We show that the SCAP methodology performs extremely well on twitter messages and even with restrictions on the types of information allowed, such as the recipient of directed messages, still perform significantly higher than chance. Further to this, we show that 120 tweets per user is an important threshold, at which point adding more tweets per user gives a small but non-significant increase in accuracy. © 2010 IEEE.
Malicious Spam Emails Developments and Authorship Attribution
- Authors: Alazab, Mamoun , Layton, Robert , Broadhurst, Roderic , Bouhours, Brigitte
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 58-68
- Full Text: false
- Reviewed:
- Description: The Internet is a decentralized structure that offers speedy communication, has a global reach and provides anonymity, a characteristic invaluable for committing illegal activities. In parallel with the spread of the Internet, cybercrime has rapidly evolved from a relatively low volume crime to a common high volume crime. A typical example of such a crime is the spreading of spam emails, where the content of the email tries to entice the recipient to click a URL linking to a malicious Web site or downloading a malicious attachment. Analysts attempting to provide intelligence on spam activities quickly find that the volume of spam circulating daily is overwhelming; therefore, any intelligence gathered is representative of only a small sample, not of the global picture. While past studies have looked at automating some of these analyses using topic-based models, i.e. separating email clusters into groups with similar topics, our preliminary research investigates the usefulness of applying authorship-based models for this purpose. In the first phase, we clustered a set of spam emails using an authorship-based clustering algorithm. In the second phase, we analysed those clusters using a set of linguistic, structural and syntactic features. These analyses reveal that emails within each cluster were likely written by the same author, but that it is unlikely we have managed to group together all spam produced by each group. This problem of high purity with low recall, has been faced in past authorship research. While it is also a limitation of our research, the clusters themselves are still useful for the purposes of automating analysis, because they reduce the work needing to be performed. Our second phase revealed useful information on the group that can be utilized in future research for further analysis of such groups, for example, identifying further linkages behind spam campaigns.
Be careful who you trust: Issues with the public key infrastructure
- Authors: Black, Paul , Layton, Robert
- Type: Text , Conference paper
- Relation: 5th Cybercrime and Trustworthy Computing Conference, CTC 2014, Auckland, 24-25th November, 2014 published in Cybercrime and Trustworthy Computing Conference (CTC) p. 12-21
- Full Text: false
- Reviewed:
- Description: The modern digital internet economy and billions of dollars of trade are made possible by the internet security which is provided by operating system and web browser developers. This paper provides a survey of how this security is implemented through the use of digital certificates and the Public Key Infrastructure. Documented cases of the abuse of these digital certificates are given. It is shown that these problems arise from a combination of commercial pressures and a failure of the designers of internet security to consider the fundamental security principal of least privilege. Measures which are used to mitigate these problems are noted and new PKI architectural components which are designed to correct existing problems are examined.