Improving authorship attribution in twitter through topic-based sampling
- Authors: Pan, Luoxi , Gondal, Iqbal , Layton, Robert
- Date: 2017
- Type: Text , Conference proceedings
- Relation: 30th Australasian Joint Conference on Artificial Intelligence, AI 2017 : Advances in Artificial Intelligence; Melbourne, Australia; 19th-20th August 2017; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 10400 LNAI, p. 250-261
- Full Text: false
- Reviewed:
- Description: Aliases are used as a means of anonymity on the Internet in environments such as IRC (internet relay chat), forums and micro-blogging websites such as Twitter. While there are genuine reasons for the use of aliases, such as journalists operating in politically oppressive countries, they are increasingly being used by cybercriminals and extremist organisations. In recent years, we have seen increased research on authorship attribution of Twitter messages, including authorship analysis of aliases. Previous studies have shown that anti-aliasing of randomly generated sub-aliases yields high accuracies when linking the sub-aliases, but become much less accurate when topic-based sub-aliases are used. N-gram methods have previously been demonstrated to perform better than other methods in this situation. This paper investigates the effect of topic-based sampling on authorship attribution accuracy for the popular micro-blogging website Twitter. Features are extracted using character n-grams, which accurately capture differences in authorship style. These features are analysed using support vector machines using a one-versus-all classifier. The predictive performance of the algorithm is then evaluated using two different sampling methodologies - authors that were sampled through a context-sensitive topic-based search and authors that were sampled randomly. Topic-based sampling of authors is found to produce more accurate authorship predictions. This paper presents several theories as to why this might be the case. © Springer International Publishing AG 2017.
Optimization based clustering algorithms for authorship analysis of phishing emails
- Authors: Seifollahi, Sattar , Bagirov, Adil , Layton, Robert , Gondal, Iqbal
- Date: 2017
- Type: Text , Journal article
- Relation: Neural Processing Letters Vol. 46, no. 2 (2017), p. 411-425
- Relation: http://purl.org/au-research/grants/arc/DP140103213
- Full Text: false
- Reviewed:
- Description: Phishing has given attackers power to masquerade as legitimate users of organizations, such as banks, to scam money and private information from victims. Phishing is so widespread that combating the phishing attacks could overwhelm the victim organization. It is important to group the phishing attacks to formulate effective defence mechanism. In this paper, we use clustering methods to analyze and characterize phishing emails and perform their relative attribution. Emails are first tokenized to a bag-of-word space and, then, transformed to a numeric vector space using frequencies of words in documents. Wordnet vocabulary is used to take effects of similar words into account and to reduce sparsity. The word similarity measure is combined with the term frequencies to introduce a novel text transformation into numeric features. To improve the accuracy, we apply inverse document frequency weighting, which gives higher weights to features used by fewer authors. The k-means and recently introduced three optimization based algorithms: MS-MGKM, INCA and DCClust are applied for clustering purposes. The optimization based algorithms indicate the existence of well separated clusters in the phishing emails dataset. © 2017, Springer Science+Business Media New York.
Relative cyberattack Attribution
- Authors: Layton, Robert
- Date: 2016
- Type: Text , Book chapter
- Relation: Automating Open Source Intelligence Chapter 3 p. 37-60
- Full Text: false
- Reviewed:
- Description: Cybercrime and cyberattacks are problems that cause billions of dollars in direct losses per year (Anderson et al., 2013), and even more in indirect losses, such as costs for protection systems such as antivirus programs (Layton & Watters, 2014). While defensive systems have made enormous progress over the last 20 years for these attacks, the escalating battle between attackers and defenders continues (Alazab, Layton, Venkataraman, & Watters, 2010). While it is harder (arguably) to attack systems today than ever before, cyber-based attacks continue to cause damage to online commerce, critical infrastructure, and the population in general
Automating Open Source Intelligence: Algorithms for OSINT
- Authors: Layton, Robert , Watters, Paul
- Date: 2015
- Type: Text , Book
- Full Text: false
- Reviewed:
- Description: Algorithms for Automating Open Source Intelligence (OSINT) presents information on the gathering of information and extraction of actionable intelligence from openly available sources, including news broadcasts, public repositories, and more recently, social media. As OSINT has applications in crime fighting, state-based intelligence, and social research, this book provides recent advances in text mining, web crawling, and other algorithms that have led to advances in methods that can largely automate this process.
Ethical considerations when using online datasets for research purposes
- Authors: Kopp, Christian , Layton, Robert , Gondal, Iqbal , Sillitoe, Jim
- Date: 2015
- Type: Text , Book chapter
- Relation: Automating Open Source Intelligence: Algorithms for OSINT p. 131-157
- Full Text: false
- Reviewed:
- Description: The Internet has become an important community communications platform, supporting a range of programs and virtual environments. While there are many ways in which people choose to develop personal interactions over the Internet, one of the most popular manifestations is the creation and maintenance of social relationships using social and dating websites. In this chapter, the collection and use of data from such sites is assessed from an ethical frame, and key concepts such as informed consent, information, comprehension, and voluntariness are outlined.
Mining malware to detect variants
- Authors: Azab, Ahmad , Layton, Robert , Alazab, Mamoun , Oliver, Jonathan
- Date: 2015
- Type: Text , Conference paper
- Relation: 5th Cybercrime and Trustworthy Computing Conference, CTC 2014; Aukland, New Zealand; 24th-25th November 2014 p. 44-53
- Full Text: false
- Reviewed:
- Description: Cybercrime continues to be a growing challenge and malware is one of the most serious security threats on the Internet today which have been in existence from the very early days. Cyber criminals continue to develop and advance their malicious attacks. Unfortunately, existing techniques for detecting malware and analysing code samples are insufficient and have significant limitations. For example, most of malware detection studies focused only on detection and neglected the variants of the code. Investigating malware variants allows antivirus products and governments to more easily detect these new attacks, attribution, predict such or similar attacks in the future, and further analysis. The focus of this paper is performing similarity measures between different malware binaries for the same variant utilizing data mining concepts in conjunction with hashing algorithms. In this paper, we investigate and evaluate using the Trend Locality Sensitive Hashing (TLSH) algorithm to group binaries that belong to the same variant together, utilizing the k-NN algorithm. Two Zeus variants were tested, TSPY-ZBOT and MAL-ZBOT to address the effectiveness of the proposed approach. We compare TLSH to related hashing methods (SSDEEP, SDHASH and NILSIMSA) that are currently used for this purpose. Experimental evaluation demonstrates that our method can effectively detect variants of malware and resilient to common obfuscations used by cyber criminals. Our results show that TLSH and SDHASH provide the highest accuracy results in scoring an F-measure of 0.989 and 0.999 respectively. © 2014 IEEE.
Authorship analysis of the zeus botnet source code
- Authors: Layton, Robert , Azab, Ahmad
- Date: 2014
- Type: Text , Conference proceedings
- Full Text: false
- Description: Authorship analysis has been used successfully to analyse the provenance of source code files in previous studies. The source code for Zeus, one of the most damaging and effective botnets to date, was leaked in 2011. In this research, we analyse the source code from the lens of authorship clustering, aiming to estimate how many people wrote this malware, and what their roles are. The research provides insight into the structure the went into creating Zeus and its evolution over time. The work has potential to be used to link the malware with other malware written by the same authors, helping investigations, classification, deterrence and detection. © 2014 IEEE.
API design for machine learning software: experiences from the scikit-learn project
- Authors: Buitinick, Lars , Louppe, Gilles , Blondel, Mathieu , Pedregosa, Fabian , Mueller, Andrea , Grisel, Olivier , Nicuale, Vlad , Prettenhofer, Peter , Gramfort, Alexandre , Grobler, Jaques , Layton, Robert , Vanderplas, Jake , Joly, Arnaud , Holt, Brian , Varoquax, Gael
- Date: 2013
- Type: Text , Conference paper
- Relation: Workshop on Language for Data mining and Machine Learning LML 013
- Full Text: false
- Reviewed:
- Description: Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library.
Crime toolkits: The productisation of cybercrime
- Authors: Alazab, Ammar , Abawajy, Jemal , Hobbs, Michael , Layton, Robert , Khraisat, Ansam
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2013 p. 1626-1632
- Full Text: false
- Reviewed:
ICANN or ICANT: Is WHOIS an Enabler of Cybercrime?
- Authors: Watters, Paul , Herps, Aaron , Layton, Robert , McCombie, Stephen
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 44-49
- Full Text: false
- Reviewed:
- Description: WHOIS acts as a registry for organisations or individuals who 'own' or take responsibility for domains. For any registry to be functional, its integrity needs to be assured. Unfortunately, WHOIS data does not appear to meet basic integrity requirements in many cases, reducing the effectiveness of law enforcement and rightsholders in requesting takedowns for phishing kits, zombie hosts that are part of a botnet, or infringing content. In this paper, we illustrate the problem using a case study from trademark protection, where investigators attempt to trace fake goods being advertised on Facebook. The results indicate that ICANN needs to at least introduce minimum verification standards for WHOIS records vis-Ã -vis integrity, and optimally, develop a system for rapid takedowns in the event that a domain is being misused.
Identifying Faked Hotel Reviews Using Authorship Analysis
- Authors: Layton, Robert , Watters, Paul , Ureche, Oana
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 1-6
- Full Text: false
- Reviewed:
- Description: The use of online review sites has grown significantly, allowing for communities to share information on products or services.These online review sites are marketed as being independent and trustworthy, but have been criticised for not ensuring the integrity of the reviews.One major concern is that of review fraud; where a person (such as a marketer) is paid to write favourable reviews for one product or poor reviews for a competitor.In this research we show a method for determining if two reviews share an author, which can be used to identify if a review is legitimate.Our results indicate a high quality of the method, with an f-1-score of over 0.66 in testing data with 40 authors, with most authors having only one or two documents.This type of analysis can be used to investigate cases of potential hotel review fraud.
Indirect information linkage for OSINT through authorship analysis of aliases
- Authors: Layton, Robert , Perez, Charles , Birregah, Babiga , Watters, Paul , Lemercier, Marc
- Date: 2013
- Type: Text , Conference paper
- Relation: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining Vol. 7867 LNAI, p. 36-46
- Full Text: false
- Reviewed:
- Description: In this paper we examine the problem of automatically linking online accounts for open source intelligence gathering. We specifically aim to determine if two social media accounts are shared by the same author, without the use of direct linking evidence. We profile the accounts using authorship analysis and find the best matching guess. We apply this to a series of Twitter accounts identified as malicious by a methodology named SPOT and find several pairs of accounts that belong to the same author, despite no direct evidence linking the two. Overall, our results show that linking aliases is possible with an accuracy of 84%, and using our automated threshold method improves our accuracy to over 90% by removing incorrectly discovered matches. © Springer-Verlag 2013.
Malicious Spam Emails Developments and Authorship Attribution
- Authors: Alazab, Mamoun , Layton, Robert , Broadhurst, Roderic , Bouhours, Brigitte
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 58-68
- Full Text: false
- Reviewed:
- Description: The Internet is a decentralized structure that offers speedy communication, has a global reach and provides anonymity, a characteristic invaluable for committing illegal activities. In parallel with the spread of the Internet, cybercrime has rapidly evolved from a relatively low volume crime to a common high volume crime. A typical example of such a crime is the spreading of spam emails, where the content of the email tries to entice the recipient to click a URL linking to a malicious Web site or downloading a malicious attachment. Analysts attempting to provide intelligence on spam activities quickly find that the volume of spam circulating daily is overwhelming; therefore, any intelligence gathered is representative of only a small sample, not of the global picture. While past studies have looked at automating some of these analyses using topic-based models, i.e. separating email clusters into groups with similar topics, our preliminary research investigates the usefulness of applying authorship-based models for this purpose. In the first phase, we clustered a set of spam emails using an authorship-based clustering algorithm. In the second phase, we analysed those clusters using a set of linguistic, structural and syntactic features. These analyses reveal that emails within each cluster were likely written by the same author, but that it is unlikely we have managed to group together all spam produced by each group. This problem of high purity with low recall, has been faced in past authorship research. While it is also a limitation of our research, the clusters themselves are still useful for the purposes of automating analysis, because they reduce the work needing to be performed. Our second phase revealed useful information on the group that can be utilized in future research for further analysis of such groups, for example, identifying further linkages behind spam campaigns.
REPLOT: REtrieving profile links on Twitter for suspicious networks detection
- Authors: Perez, Charles , Birregah, Babiga , Layton, Robert , Lemercier, Marc , Watters, Paul
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013 p. 1307-1314
- Full Text: false
- Reviewed:
- Description: In the last few decades social networking sites have encountered their first large-scale security issues. The high number of users associated with the presence of sensitive data (personal or professional) is certainly an unprecedented opportunity for malicious activities. As a result, one observes that malicious users are progressively turning their attention from traditional e-mail to online social networks to carry out their attacks. Moreover, it is now observed that attacks are not only performed by individual profiles, but that on a larger scale, a set of profiles can act in coordination in making such attacks. The latter are referred to as malicious social campaigns. In this paper, we present a novel approach that combines authorship attribution techniques with a behavioural analysis for detecting and characterizing social campaigns. The proposed approach is performed in three steps: first, suspicious profiles are identified from a behavioural analysis; second, connections between suspicious profiles are retrieved using a combination of authorship attribution and temporal similarity; third, a clustering algorithm is performed to identify and characterise the suspicious campaigns obtained. We provide a real-life application of the methodology on a sample of 1,000 suspicious Twitter profiles tracked over a period of forty days. Our results show that a large set of suspicious profiles behaves in coordination (70%) and propagates mainly, but not only, trustworthy URLs on the online social network. Among the three largest detected campaigns, we have highlighted that one represents an important security issue for the platform by promoting a significant set of malicious URLs. Copyright 2013 ACM.
Skype Traffic Classification Using Cost Sensitive Algorithms
- Authors: Azab, Azab , Layton, Robert , Alazab, Mamoun , Watters, Paul
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 14-21
- Full Text: false
- Reviewed:
- Description: Voice over IP (VoIP) technologies such as Skype are becoming increasingly popular and widely used in different organisations, and therefore identifying the usage of this service at the network level becomes very important. Reasons for this include applying Quality of Service (QoS), network planning, prohibiting its use in some networks and lawful interception of communications. Researchers have addressed VoIP traffic classification from different viewpoints, such as classifier accuracy, building time, classification time and online classification. This previous research tested their models using the same version of a VoIP product they used for training the model, giving generalizability only to that version of the product. This means that as new VoIP versions are released, these classifiers become obsolete. In this paper, we address if this approach is applicable to detecting new, untrained, versions of Skype. We suggest that using cost-sensitive classifiers can help to improve the accuracy of detecting untrained versions, by testing compared to other algorithms. Our experiment demonstrates promising preliminary results to detect Skype version 4, by building a cost sensitive classifier on Skype version 3, achieving an F-measure score of 0.57. This is a drastic improvement from not using cost sensitivity, which scores an F-measure of 0. This approach may be enhanced to improve the detection results and extended to improve detection for other applications that change protocols from version to version.
Authorship attribution of IRC messages using inverse author frequency
- Authors: Layton, Robert , McCombie, Stephen , Watters, Paul
- Date: 2012
- Type: Text , Conference proceedings
- Full Text: false
- Description: Internet Relay Chat (IRC) is a useful and relativelysimple protocol for text based chat online, used in a variety ofareas online such as for discussion and technical support. IRC isalso used for cybercrime, with online rooms selling stolen creditcard details, botnet access and malware. The reasons for theuse of IRC in cybercrime include the widespread adoption andease of use, but also focus around the anonymity granted bythe protocol, allowing users to hide behind aliases that can bechanged regularly. In this research, we apply authorship analysistechniques to be able to attribute chat messages to known aliases.A preliminary experiment shows that this application is verydifficult, due to the short messages and repeated information.To improve the accuracy, we apply inverse-author-frequency(iaf) weighting, which gives higher weights to features used byfewer authors. This research is the first time that iaf has beenapplied to character n-gram models, previously being applied toword based models of authorship. We find that this improvesthe accuracy significantly for the RLP method and provides aplatform for successful applications of authorship analysis in thefuture. Overall, the method achieves accuracies of over 55% ina very difficult application domain. © 2012 IEEE.
- Description: 2003011051
Characterising and predicting cyber attacks using the Cyber Attacker Model Profile (CAMP)
- Authors: Watters, Paul , McCombie, Stephen , Layton, Robert , Pieprzyk, Josef
- Date: 2012
- Type: Text , Journal article
- Relation: Journal of Money Laundering Control Vol. 15, no. 4 (2012), p. 430-441
- Full Text: false
- Reviewed:
- Description: Purpose – Ethnographic studies of cyber attacks typically aim to explain a particular profile of attackers in qualitative terms. The purpose of this paper is to formalise some of the approaches to build a Cyber Attacker Model Profile (CAMP) that can be used to characterise and predict cyber attacks. Design/methodology/approach – The paper builds a model using social and economic independent or predictive variables from several eastern European countries and benchmarks indicators of cybercrime within the Australian financial services system. Findings – The paper found a very strong link between perceived corruption and GDP in two distinct groups of countries – corruption in Russia was closely linked to the GDP of Belarus, Moldova and Russia, while corruption in Lithuania was linked to GDP in Estonia, Latvia, Lithuania and Ukraine. At the same time corruption in Russia and Ukraine were also closely linked. These results support previous research that indicates a strong link between been legitimate economy and the black economy in many countries of Eastern Europe and the Baltic states. The results of the regression analysis suggest that a highly skilled workforce which is mobile and working in an environment of high perceived corruption in the target countries is related to increases in cybercrime even within Australia. It is important to note that the data used for the dependent and independent variables were gathered over a seven year time period, which included large economic shocks such as the global financial crisis. Originality/value – This is the first paper to use a modelling approach to directly show the relationship between various social, economic and demographic factors in the Baltic states and Eastern Europe, and the level of card skimming and card not present fraud in Australia. Acknowledgements: Paul A. Watters and Robert Layton are funded by IBM, Westpac, the State Government of Victoria and the Australian Federal Police.
- Description: 2003011112
Characterising network traffic for Skype forensics
- Authors: Azab, Ahmad , Watters, Paul , Layton, Robert
- Date: 2012
- Type: Text , Conference proceedings
- Full Text: false
- Description: Voice over IP (VoIP) is increasingly replacing fixed line telephone systems globally due to lower cost, call quality improvements over digital lines and ease of availability. At the same time, criminals have also transitioned to using this environment, creating challenges for law enforcement, since interception of VoIP traffic is more difficult than a traditional telephony environment. One key problem for proprietary VoIP algorithms like Skype is being able to reliably identify and characterize network traffic. In this paper, the latest Skype version and its components are analyzed, in terms of network traffic behavior for logins, calls establishment, call answering and the change status phases. Network conditions tested included blocking different port numbers, inbound connections and outbound connections. The results provide a clearer view of the difficulties in characterizing Skype traffic in forensic contexts. We also found different changes from previous investigations into older versions of Skype. © 2012 IEEE.
- Description: 2003011053
Identifying cyber predators through forensic authorship analysis of chat logs
- Authors: Amuchi, Faith , Al-Nemrat, Ameer , Alazab, Mamoun , Layton, Robert
- Date: 2012
- Type: Text , Conference proceedings
- Full Text: false
- Description: Online Grooming is a growing phenomenon within online environments. One of the major problems encountered in qualitative internet research of chat communication is the issue of anonymity which is being exploited and greatly enjoyed by chatters. An important question that has been asked in the literature is 'How can a researcher be sure to analyse the communication of children and adolescents and not the chat communication of adults who pretend to be under 18?'. Our reply to this question would be the field of Authorship Analysis. Authorship Analysis offers a way to unmask the anonymity of cyber predators. Stylometry, as used in this chat log analysis, is a type of Authorship Analysis that is not based on an author's handwriting but includes contextual clues from the content of their writings. This research paper will analyse the application of different authorship attribution techniques to chat log from a forensic perspective. © 2012 IEEE.
- Description: 2003011054
Application of SVM in citation information extraction
- Authors: Liang, Jiguang , Layton, Robert , Wang, Wei
- Date: 2011
- Type: Text , Conference proceedings
- Full Text: false
- Description: Support Vector Machines are an effective form of binary-class classification algorithm. To enhance the utilization of text structural features for information extraction, which are greatly restricted by the Hidden Markov Model (HMM), this paper proposes a support vector machine multi-class classification based on Markov properties to extract the information from a citation database. The proposed model extracts symbol characteristics as features and composes a binary tree of the transition probabilities. Experiments show that the proposed method outperforms HMM and basic SVM methods. © 2011 IEEE.