Mining malware to detect variants
- Authors: Azab, Ahmad , Layton, Robert , Alazab, Mamoun , Oliver, Jonathan
- Date: 2015
- Type: Text , Conference paper
- Relation: 5th Cybercrime and Trustworthy Computing Conference, CTC 2014; Aukland, New Zealand; 24th-25th November 2014 p. 44-53
- Full Text: false
- Reviewed:
- Description: Cybercrime continues to be a growing challenge and malware is one of the most serious security threats on the Internet today which have been in existence from the very early days. Cyber criminals continue to develop and advance their malicious attacks. Unfortunately, existing techniques for detecting malware and analysing code samples are insufficient and have significant limitations. For example, most of malware detection studies focused only on detection and neglected the variants of the code. Investigating malware variants allows antivirus products and governments to more easily detect these new attacks, attribution, predict such or similar attacks in the future, and further analysis. The focus of this paper is performing similarity measures between different malware binaries for the same variant utilizing data mining concepts in conjunction with hashing algorithms. In this paper, we investigate and evaluate using the Trend Locality Sensitive Hashing (TLSH) algorithm to group binaries that belong to the same variant together, utilizing the k-NN algorithm. Two Zeus variants were tested, TSPY-ZBOT and MAL-ZBOT to address the effectiveness of the proposed approach. We compare TLSH to related hashing methods (SSDEEP, SDHASH and NILSIMSA) that are currently used for this purpose. Experimental evaluation demonstrates that our method can effectively detect variants of malware and resilient to common obfuscations used by cyber criminals. Our results show that TLSH and SDHASH provide the highest accuracy results in scoring an F-measure of 0.989 and 0.999 respectively. © 2014 IEEE.
Malicious Spam Emails Developments and Authorship Attribution
- Authors: Alazab, Mamoun , Layton, Robert , Broadhurst, Roderic , Bouhours, Brigitte
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 58-68
- Full Text: false
- Reviewed:
- Description: The Internet is a decentralized structure that offers speedy communication, has a global reach and provides anonymity, a characteristic invaluable for committing illegal activities. In parallel with the spread of the Internet, cybercrime has rapidly evolved from a relatively low volume crime to a common high volume crime. A typical example of such a crime is the spreading of spam emails, where the content of the email tries to entice the recipient to click a URL linking to a malicious Web site or downloading a malicious attachment. Analysts attempting to provide intelligence on spam activities quickly find that the volume of spam circulating daily is overwhelming; therefore, any intelligence gathered is representative of only a small sample, not of the global picture. While past studies have looked at automating some of these analyses using topic-based models, i.e. separating email clusters into groups with similar topics, our preliminary research investigates the usefulness of applying authorship-based models for this purpose. In the first phase, we clustered a set of spam emails using an authorship-based clustering algorithm. In the second phase, we analysed those clusters using a set of linguistic, structural and syntactic features. These analyses reveal that emails within each cluster were likely written by the same author, but that it is unlikely we have managed to group together all spam produced by each group. This problem of high purity with low recall, has been faced in past authorship research. While it is also a limitation of our research, the clusters themselves are still useful for the purposes of automating analysis, because they reduce the work needing to be performed. Our second phase revealed useful information on the group that can be utilized in future research for further analysis of such groups, for example, identifying further linkages behind spam campaigns.
Skype Traffic Classification Using Cost Sensitive Algorithms
- Authors: Azab, Azab , Layton, Robert , Alazab, Mamoun , Watters, Paul
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 14-21
- Full Text: false
- Reviewed:
- Description: Voice over IP (VoIP) technologies such as Skype are becoming increasingly popular and widely used in different organisations, and therefore identifying the usage of this service at the network level becomes very important. Reasons for this include applying Quality of Service (QoS), network planning, prohibiting its use in some networks and lawful interception of communications. Researchers have addressed VoIP traffic classification from different viewpoints, such as classifier accuracy, building time, classification time and online classification. This previous research tested their models using the same version of a VoIP product they used for training the model, giving generalizability only to that version of the product. This means that as new VoIP versions are released, these classifiers become obsolete. In this paper, we address if this approach is applicable to detecting new, untrained, versions of Skype. We suggest that using cost-sensitive classifiers can help to improve the accuracy of detecting untrained versions, by testing compared to other algorithms. Our experiment demonstrates promising preliminary results to detect Skype version 4, by building a cost sensitive classifier on Skype version 3, achieving an F-measure score of 0.57. This is a drastic improvement from not using cost sensitivity, which scores an F-measure of 0. This approach may be enhanced to improve the detection results and extended to improve detection for other applications that change protocols from version to version.
Identifying cyber predators through forensic authorship analysis of chat logs
- Authors: Amuchi, Faith , Al-Nemrat, Ameer , Alazab, Mamoun , Layton, Robert
- Date: 2012
- Type: Text , Conference proceedings
- Full Text: false
- Description: Online Grooming is a growing phenomenon within online environments. One of the major problems encountered in qualitative internet research of chat communication is the issue of anonymity which is being exploited and greatly enjoyed by chatters. An important question that has been asked in the literature is 'How can a researcher be sure to analyse the communication of children and adolescents and not the chat communication of adults who pretend to be under 18?'. Our reply to this question would be the field of Authorship Analysis. Authorship Analysis offers a way to unmask the anonymity of cyber predators. Stylometry, as used in this chat log analysis, is a type of Authorship Analysis that is not based on an author's handwriting but includes contextual clues from the content of their writings. This research paper will analyse the application of different authorship attribution techniques to chat log from a forensic perspective. © 2012 IEEE.
- Description: 2003011054
Malware detection based on structural and behavioural features of API calls
- Authors: Alazab, Mamoun , Layton, Robert , Venkatraman, Sitalakshmi , Watters, Paul
- Date: 2010
- Type: Text , Conference proceedings
- Full Text: false
- Description: In this paper, we propose a five-step approach to detect obfuscated malware by investigating the structural and behavioural features of API calls. We have developed a fully automated system to disassemble and extract API call features effectively from executables. Using n-gram statistical analysis of binary content, we are able to classify if an executable file is malicious or benign. Our experimental results with a dataset of 242 malwares and 72 benign files have shown a promising accuracy of 96.5% for the unigram model. We also provide a preliminary analysis by our approach using support vector machine (SVM) and by varying n-values from 1 to 5, we have analysed the performance that include accuracy, false positives and false negatives. By applying SVM, we propose to train the classifier and derive an optimum n-gram model for detecting both known and unknown malware efficiently.