Improving authorship attribution in twitter through topic-based sampling
- Authors: Pan, Luoxi , Gondal, Iqbal , Layton, Robert
- Date: 2017
- Type: Text , Conference proceedings
- Relation: 30th Australasian Joint Conference on Artificial Intelligence, AI 2017 : Advances in Artificial Intelligence; Melbourne, Australia; 19th-20th August 2017; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 10400 LNAI, p. 250-261
- Full Text: false
- Reviewed:
- Description: Aliases are used as a means of anonymity on the Internet in environments such as IRC (internet relay chat), forums and micro-blogging websites such as Twitter. While there are genuine reasons for the use of aliases, such as journalists operating in politically oppressive countries, they are increasingly being used by cybercriminals and extremist organisations. In recent years, we have seen increased research on authorship attribution of Twitter messages, including authorship analysis of aliases. Previous studies have shown that anti-aliasing of randomly generated sub-aliases yields high accuracies when linking the sub-aliases, but become much less accurate when topic-based sub-aliases are used. N-gram methods have previously been demonstrated to perform better than other methods in this situation. This paper investigates the effect of topic-based sampling on authorship attribution accuracy for the popular micro-blogging website Twitter. Features are extracted using character n-grams, which accurately capture differences in authorship style. These features are analysed using support vector machines using a one-versus-all classifier. The predictive performance of the algorithm is then evaluated using two different sampling methodologies - authors that were sampled through a context-sensitive topic-based search and authors that were sampled randomly. Topic-based sampling of authors is found to produce more accurate authorship predictions. This paper presents several theories as to why this might be the case. © Springer International Publishing AG 2017.
Online romance scam: Expensive e-living for romantic happiness
- Authors: Kopp, Christian , Sillitoe, James , Gondal, Iqbal , Layton, Robert
- Date: 2016
- Type: Text , Conference proceedings
- Relation: Proceedings of the 29th Bled eConference: Digital Economy (BLED 2016), Slovenia, pp.175-189 p. 15
- Full Text:
- Description: The Online Romance Scam is a very successful scam which causes considerable financial and emotional damage to its victims. It is based on building a relationship which establishes a deep trust that causes victims to voluntarily transfer funds to the scammer. The aim of this research is to explore online dating scams as a type of e-Living which initially creates happiness for the victim in a virtual romantic relationship, but tragically then causes the victim to be separated from his or her savings. Using narrative research methodology, this research will establish a model of the romance scam structure and its variations regarding human romantic attitudes, and will develop a theory which explains how the victim is moved through the phases of the scam. Findings of this research will contribute to the knowledge of the Online Romance Scam as e-Crime and provide information about the structure and the development of the modus operandi which can be used to identify an online relationship as a scam at an early phase in order to prevent significant harm to the victim.
Authorship analysis of the zeus botnet source code
- Authors: Layton, Robert , Azab, Ahmad
- Date: 2014
- Type: Text , Conference proceedings
- Full Text: false
- Description: Authorship analysis has been used successfully to analyse the provenance of source code files in previous studies. The source code for Zeus, one of the most damaging and effective botnets to date, was leaked in 2011. In this research, we analyse the source code from the lens of authorship clustering, aiming to estimate how many people wrote this malware, and what their roles are. The research provides insight into the structure the went into creating Zeus and its evolution over time. The work has potential to be used to link the malware with other malware written by the same authors, helping investigations, classification, deterrence and detection. © 2014 IEEE.
Local n-grams for author identification: Notebook for PAN at CLEF 2013 C3 - CEUR Workshop Proceedings
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Conference proceedings
- Full Text:
- Description: Our approach to the author identification task uses existing authorship attribution methods using local n-grams (LNG) and performs a weighted ensemble. This approach came in third for this year's competition, using a relatively simple scheme of weights by training set accuracy. LNG models create profiles, consisting of a list of character n-grams that best represent a particular author's writing. The use of a weighted ensemble improved upon the accuracy of the method without reducing the speed of the algorithm; the submitted solution was not only near the top of the leaderboard in terms of accuracy, but it was also one of the faster algorithms submitted.
Authorship attribution of IRC messages using inverse author frequency
- Authors: Layton, Robert , McCombie, Stephen , Watters, Paul
- Date: 2012
- Type: Text , Conference proceedings
- Full Text: false
- Description: Internet Relay Chat (IRC) is a useful and relativelysimple protocol for text based chat online, used in a variety ofareas online such as for discussion and technical support. IRC isalso used for cybercrime, with online rooms selling stolen creditcard details, botnet access and malware. The reasons for theuse of IRC in cybercrime include the widespread adoption andease of use, but also focus around the anonymity granted bythe protocol, allowing users to hide behind aliases that can bechanged regularly. In this research, we apply authorship analysistechniques to be able to attribute chat messages to known aliases.A preliminary experiment shows that this application is verydifficult, due to the short messages and repeated information.To improve the accuracy, we apply inverse-author-frequency(iaf) weighting, which gives higher weights to features used byfewer authors. This research is the first time that iaf has beenapplied to character n-gram models, previously being applied toword based models of authorship. We find that this improvesthe accuracy significantly for the RLP method and provides aplatform for successful applications of authorship analysis in thefuture. Overall, the method achieves accuracies of over 55% ina very difficult application domain. © 2012 IEEE.
- Description: 2003011051
Characterising network traffic for Skype forensics
- Authors: Azab, Ahmad , Watters, Paul , Layton, Robert
- Date: 2012
- Type: Text , Conference proceedings
- Full Text: false
- Description: Voice over IP (VoIP) is increasingly replacing fixed line telephone systems globally due to lower cost, call quality improvements over digital lines and ease of availability. At the same time, criminals have also transitioned to using this environment, creating challenges for law enforcement, since interception of VoIP traffic is more difficult than a traditional telephony environment. One key problem for proprietary VoIP algorithms like Skype is being able to reliably identify and characterize network traffic. In this paper, the latest Skype version and its components are analyzed, in terms of network traffic behavior for logins, calls establishment, call answering and the change status phases. Network conditions tested included blocking different port numbers, inbound connections and outbound connections. The results provide a clearer view of the difficulties in characterizing Skype traffic in forensic contexts. We also found different changes from previous investigations into older versions of Skype. © 2012 IEEE.
- Description: 2003011053
Identifying cyber predators through forensic authorship analysis of chat logs
- Authors: Amuchi, Faith , Al-Nemrat, Ameer , Alazab, Mamoun , Layton, Robert
- Date: 2012
- Type: Text , Conference proceedings
- Full Text: false
- Description: Online Grooming is a growing phenomenon within online environments. One of the major problems encountered in qualitative internet research of chat communication is the issue of anonymity which is being exploited and greatly enjoyed by chatters. An important question that has been asked in the literature is 'How can a researcher be sure to analyse the communication of children and adolescents and not the chat communication of adults who pretend to be under 18?'. Our reply to this question would be the field of Authorship Analysis. Authorship Analysis offers a way to unmask the anonymity of cyber predators. Stylometry, as used in this chat log analysis, is a type of Authorship Analysis that is not based on an author's handwriting but includes contextual clues from the content of their writings. This research paper will analyse the application of different authorship attribution techniques to chat log from a forensic perspective. © 2012 IEEE.
- Description: 2003011054
Towards an implementation of information flow security using semantic web technologies
- Authors: Ureche, Oana , Layton, Robert , Watters, Paul
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Controlling the flow of sensitive data has been widely acknowledged as a critical aspect for securing web information systems. A common limitation of previous approaches for the implementation of the information flow control is their proposal of new scripting languages. This makes them infeasible to be applied to existing systems written in traditional programming languages as these systems need to be redeveloped in the proposed scripting language. This paper proposes a methodology that offers a common interlinqua through the use of Semantic Web technologies for securing web information systems independently of their programming language. © 2012 IEEE.
- Description: 2003011056
Unsupervised authorship analysis of phishing webpages
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Authorship analysis on phishing websites enables the investigation of phishing attacks, beyond basic analysis. In authorship analysis, salient features from documents are used to determine properties about the author, such as which of a set of candidate authors wrote a given document. In unsupervised authorship analysis, the aim is to group documents such that all documents by one author are grouped together. Applying this to cyber-attacks shows the size and scope of attacks from specific groups. This in turn allows investigators to focus their attention on specific attacking groups rather than trying to profile multiple independent attackers. In this paper, we analyse phishing websites using the current state of the art unsupervised authorship analysis method, called NUANCE. The results indicate that the application produces clusters which correlate strongly to authorship, evaluated using expert knowledge and external information as well as showing an improvement over a previous approach with known flaws. © 2012 IEEE.
- Description: 2003010678
Application of SVM in citation information extraction
- Authors: Liang, Jiguang , Layton, Robert , Wang, Wei
- Date: 2011
- Type: Text , Conference proceedings
- Full Text: false
- Description: Support Vector Machines are an effective form of binary-class classification algorithm. To enhance the utilization of text structural features for information extraction, which are greatly restricted by the Hidden Markov Model (HMM), this paper proposes a support vector machine multi-class classification based on Markov properties to extract the information from a citation database. The proposed model extracts symbol characteristics as features and composes a binary tree of the transition probabilities. Experiments show that the proposed method outperforms HMM and basic SVM methods. © 2011 IEEE.
Fake file detection in P2P networks by consensus and reputation
- Authors: Watters, Paul , Layton, Robert
- Date: 2011
- Type: Text , Conference proceedings
- Full Text: false
- Description: Previous research [1] has indicated that reputation scores can be used as the basis for trust computation in P2P networks. In this paper, we use reputation scores calculated from P2P search engine rating sites to determine whether a torrent is likely to be linked to a fake file (or not). Our results indicate clear separability between files which are fake and which are genuine, assuming the integrity of the "community" ratings provided by specific subcultural groups [2]. Suggestions for more sophisticated reputation-based scoring are also provided. © 2011 Crown.
Malware detection based on structural and behavioural features of API calls
- Authors: Alazab, Mamoun , Layton, Robert , Venkatraman, Sitalakshmi , Watters, Paul
- Date: 2010
- Type: Text , Conference proceedings
- Full Text: false
- Description: In this paper, we propose a five-step approach to detect obfuscated malware by investigating the structural and behavioural features of API calls. We have developed a fully automated system to disassemble and extract API call features effectively from executables. Using n-gram statistical analysis of binary content, we are able to classify if an executable file is malicious or benign. Our experimental results with a dataset of 242 malwares and 72 benign files have shown a promising accuracy of 96.5% for the unigram model. We also provide a preliminary analysis by our approach using support vector machine (SVM) and by varying n-values from 1 to 5, we have analysed the performance that include accuracy, false positives and false negatives. By applying SVM, we propose to train the classifier and derive an optimum n-gram model for detecting both known and unknown malware efficiently.
The seven scam types: Mapping the terrain of cybercrime
- Authors: Stabek, Amber , Watters, Paul , Layton, Robert
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Threat of cybercrime is a growing danger to the economy. Industries and businesses are targeted by cyber-criminals along with members of the general public. Since cybercrime is often a symptom of more complex criminological regimes such as laundering, trafficking and terrorism, the true damage caused to society is unknown. Dissimilarities in reporting procedures and non-uniform cybercrime classifications lead international reporting bodies to produce incompatible results which cause difficulties in making valid comparisons. A cybercrime classification framework has been identified as necessary for the development of an inter-jurisdictional, transnational, and global approach to identify, intercept, and prosecute cyber-criminals. Outlined in this paper is a cybercrime classification framework which has been applied to the incidence of scams. Content analysis was performed on over 250 scam descriptions stemming from in excess of 35 scamming categories and over 80 static features derived. Using hierarchical cluster and discriminant function analysis, the sample was reduced from over 35 ambiguous categories into 7 scam types and the top four scamming functions - identified as scamming business processes, revealed. The results of this research bear significant ramifications to the current state of scam and cybercrime classification, research and analysis, as well as offer significant insight into the business processes and applications adopted by scammers and cyber-criminals. © 2010 IEEE.