Local n-grams for author identification: Notebook for PAN at CLEF 2013 C3 - CEUR Workshop Proceedings
- Layton, Robert, Watters, Paul, Dazeley, Richard
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Conference proceedings
- Full Text:
- Description: Our approach to the author identification task uses existing authorship attribution methods using local n-grams (LNG) and performs a weighted ensemble. This approach came in third for this year's competition, using a relatively simple scheme of weights by training set accuracy. LNG models create profiles, consisting of a list of character n-grams that best represent a particular author's writing. The use of a weighted ensemble improved upon the accuracy of the method without reducing the speed of the algorithm; the submitted solution was not only near the top of the leaderboard in terms of accuracy, but it was also one of the faster algorithms submitted.
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Conference proceedings
- Full Text:
- Description: Our approach to the author identification task uses existing authorship attribution methods using local n-grams (LNG) and performs a weighted ensemble. This approach came in third for this year's competition, using a relatively simple scheme of weights by training set accuracy. LNG models create profiles, consisting of a list of character n-grams that best represent a particular author's writing. The use of a weighted ensemble improved upon the accuracy of the method without reducing the speed of the algorithm; the submitted solution was not only near the top of the leaderboard in terms of accuracy, but it was also one of the faster algorithms submitted.
Towards an implementation of information flow security using semantic web technologies
- Ureche, Oana, Layton, Robert, Watters, Paul
- Authors: Ureche, Oana , Layton, Robert , Watters, Paul
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Controlling the flow of sensitive data has been widely acknowledged as a critical aspect for securing web information systems. A common limitation of previous approaches for the implementation of the information flow control is their proposal of new scripting languages. This makes them infeasible to be applied to existing systems written in traditional programming languages as these systems need to be redeveloped in the proposed scripting language. This paper proposes a methodology that offers a common interlinqua through the use of Semantic Web technologies for securing web information systems independently of their programming language. © 2012 IEEE.
- Description: 2003011056
- Authors: Ureche, Oana , Layton, Robert , Watters, Paul
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Controlling the flow of sensitive data has been widely acknowledged as a critical aspect for securing web information systems. A common limitation of previous approaches for the implementation of the information flow control is their proposal of new scripting languages. This makes them infeasible to be applied to existing systems written in traditional programming languages as these systems need to be redeveloped in the proposed scripting language. This paper proposes a methodology that offers a common interlinqua through the use of Semantic Web technologies for securing web information systems independently of their programming language. © 2012 IEEE.
- Description: 2003011056
Unsupervised authorship analysis of phishing webpages
- Layton, Robert, Watters, Paul, Dazeley, Richard
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Authorship analysis on phishing websites enables the investigation of phishing attacks, beyond basic analysis. In authorship analysis, salient features from documents are used to determine properties about the author, such as which of a set of candidate authors wrote a given document. In unsupervised authorship analysis, the aim is to group documents such that all documents by one author are grouped together. Applying this to cyber-attacks shows the size and scope of attacks from specific groups. This in turn allows investigators to focus their attention on specific attacking groups rather than trying to profile multiple independent attackers. In this paper, we analyse phishing websites using the current state of the art unsupervised authorship analysis method, called NUANCE. The results indicate that the application produces clusters which correlate strongly to authorship, evaluated using expert knowledge and external information as well as showing an improvement over a previous approach with known flaws. © 2012 IEEE.
- Description: 2003010678
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Authorship analysis on phishing websites enables the investigation of phishing attacks, beyond basic analysis. In authorship analysis, salient features from documents are used to determine properties about the author, such as which of a set of candidate authors wrote a given document. In unsupervised authorship analysis, the aim is to group documents such that all documents by one author are grouped together. Applying this to cyber-attacks shows the size and scope of attacks from specific groups. This in turn allows investigators to focus their attention on specific attacking groups rather than trying to profile multiple independent attackers. In this paper, we analyse phishing websites using the current state of the art unsupervised authorship analysis method, called NUANCE. The results indicate that the application produces clusters which correlate strongly to authorship, evaluated using expert knowledge and external information as well as showing an improvement over a previous approach with known flaws. © 2012 IEEE.
- Description: 2003010678
Zero-day malware detection based on supervised learning algorithms of API call signatures
- Alazab, Mamoun, Venkatraman, Sitalakshmi, Watters, Paul, Alazab, Moutaz
- Authors: Alazab, Mamoun , Venkatraman, Sitalakshmi , Watters, Paul , Alazab, Moutaz
- Date: 2011
- Type: Text , Conference proceedings
- Full Text:
- Description: Zero-day or unknown malware are created using code obfuscation techniques that can modify the parent code to produce offspring copies which have the same functionality but with different signatures. Current techniques reported in literature lack the capability of detecting zero-day malware with the required accuracy and efficiency. In this paper, we have proposed and evaluated a novel method of employing several data mining techniques to detect and classify zero-day malware with high levels of accuracy and efficiency based on the frequency of Windows API calls. This paper describes the methodology employed for the collection of large data sets to train the classifiers, and analyses the performance results of the various data mining algorithms adopted for the study using a fully automated tool developed in this research to conduct the various experimental investigations and evaluation. Through the performance results of these algorithms from our experimental analysis, we are able to evaluate and discuss the advantages of one data mining algorithm over the other for accurately detecting zero-day malware successfully. The data mining framework employed in this research learns through analysing the behavior of existing malicious and benign codes in large datasets. We have employed robust classifiers, namely Naïve Bayes (NB) Algorithm, k-Nearest Neighbor (kNN) Algorithm, Sequential Minimal Optimization (SMO) Algorithm with 4 differents kernels (SMO - Normalized PolyKernel, SMO - PolyKernel, SMO - Puk, and SMO- Radial Basis Function (RBF)), Backpropagation Neural Networks Algorithm, and J48 decision tree and have evaluated their performance. Overall, the automated data mining system implemented for this study has achieved high true positive (TP) rate of more than 98.5%, and low false positive (FP) rate of less than 0.025, which has not been achieved in literature so far. This is much higher than the required commercial acceptance level indicating that our novel technique is a major leap forward in detecting zero-day malware. This paper also offers future directions for researchers in exploring different aspects of obfuscations that are affecting the IT world today. © 2011, Australian Computer Society, Inc.
- Description: 2003009506
- Authors: Alazab, Mamoun , Venkatraman, Sitalakshmi , Watters, Paul , Alazab, Moutaz
- Date: 2011
- Type: Text , Conference proceedings
- Full Text:
- Description: Zero-day or unknown malware are created using code obfuscation techniques that can modify the parent code to produce offspring copies which have the same functionality but with different signatures. Current techniques reported in literature lack the capability of detecting zero-day malware with the required accuracy and efficiency. In this paper, we have proposed and evaluated a novel method of employing several data mining techniques to detect and classify zero-day malware with high levels of accuracy and efficiency based on the frequency of Windows API calls. This paper describes the methodology employed for the collection of large data sets to train the classifiers, and analyses the performance results of the various data mining algorithms adopted for the study using a fully automated tool developed in this research to conduct the various experimental investigations and evaluation. Through the performance results of these algorithms from our experimental analysis, we are able to evaluate and discuss the advantages of one data mining algorithm over the other for accurately detecting zero-day malware successfully. The data mining framework employed in this research learns through analysing the behavior of existing malicious and benign codes in large datasets. We have employed robust classifiers, namely Naïve Bayes (NB) Algorithm, k-Nearest Neighbor (kNN) Algorithm, Sequential Minimal Optimization (SMO) Algorithm with 4 differents kernels (SMO - Normalized PolyKernel, SMO - PolyKernel, SMO - Puk, and SMO- Radial Basis Function (RBF)), Backpropagation Neural Networks Algorithm, and J48 decision tree and have evaluated their performance. Overall, the automated data mining system implemented for this study has achieved high true positive (TP) rate of more than 98.5%, and low false positive (FP) rate of less than 0.025, which has not been achieved in literature so far. This is much higher than the required commercial acceptance level indicating that our novel technique is a major leap forward in detecting zero-day malware. This paper also offers future directions for researchers in exploring different aspects of obfuscations that are affecting the IT world today. © 2011, Australian Computer Society, Inc.
- Description: 2003009506
The seven scam types: Mapping the terrain of cybercrime
- Stabek, Amber, Watters, Paul, Layton, Robert
- Authors: Stabek, Amber , Watters, Paul , Layton, Robert
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Threat of cybercrime is a growing danger to the economy. Industries and businesses are targeted by cyber-criminals along with members of the general public. Since cybercrime is often a symptom of more complex criminological regimes such as laundering, trafficking and terrorism, the true damage caused to society is unknown. Dissimilarities in reporting procedures and non-uniform cybercrime classifications lead international reporting bodies to produce incompatible results which cause difficulties in making valid comparisons. A cybercrime classification framework has been identified as necessary for the development of an inter-jurisdictional, transnational, and global approach to identify, intercept, and prosecute cyber-criminals. Outlined in this paper is a cybercrime classification framework which has been applied to the incidence of scams. Content analysis was performed on over 250 scam descriptions stemming from in excess of 35 scamming categories and over 80 static features derived. Using hierarchical cluster and discriminant function analysis, the sample was reduced from over 35 ambiguous categories into 7 scam types and the top four scamming functions - identified as scamming business processes, revealed. The results of this research bear significant ramifications to the current state of scam and cybercrime classification, research and analysis, as well as offer significant insight into the business processes and applications adopted by scammers and cyber-criminals. © 2010 IEEE.
- Authors: Stabek, Amber , Watters, Paul , Layton, Robert
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Threat of cybercrime is a growing danger to the economy. Industries and businesses are targeted by cyber-criminals along with members of the general public. Since cybercrime is often a symptom of more complex criminological regimes such as laundering, trafficking and terrorism, the true damage caused to society is unknown. Dissimilarities in reporting procedures and non-uniform cybercrime classifications lead international reporting bodies to produce incompatible results which cause difficulties in making valid comparisons. A cybercrime classification framework has been identified as necessary for the development of an inter-jurisdictional, transnational, and global approach to identify, intercept, and prosecute cyber-criminals. Outlined in this paper is a cybercrime classification framework which has been applied to the incidence of scams. Content analysis was performed on over 250 scam descriptions stemming from in excess of 35 scamming categories and over 80 static features derived. Using hierarchical cluster and discriminant function analysis, the sample was reduced from over 35 ambiguous categories into 7 scam types and the top four scamming functions - identified as scamming business processes, revealed. The results of this research bear significant ramifications to the current state of scam and cybercrime classification, research and analysis, as well as offer significant insight into the business processes and applications adopted by scammers and cyber-criminals. © 2010 IEEE.
Towards understanding malware behaviour by the extraction of API calls
- Alazab, Mamoun, Venkatraman, Sitalakshmi, Watters, Paul
- Authors: Alazab, Mamoun , Venkatraman, Sitalakshmi , Watters, Paul
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: One of the recent trends adopted by malware authors is to use packers or software tools that instigate code obfuscation in order to evade detection by antivirus scanners. With evasion techniques such as polymorphism and metamorphism malware is able to fool current detection techniques. Thus, security researchers and the anti-virus industry are facing a herculean task in extracting payloads hidden within packed executables. It is a common practice to use manual unpacking or static unpacking using some software tools and analyse the application programming interface (API) calls for malware detection. However, extracting these features from the unpacked executables for reverse obfuscation is labour intensive and requires deep knowledge of low-level programming that includes kernel and assembly language. This paper presents an automated method of extracting API call features and analysing them in order to understand their use for malicious purpose. While some research has been conducted in arriving at file birthmarks using API call features and the like, there is a scarcity of work that relates to features in malcodes. To address this gap, we attempt to automatically analyse and classify the behavior of API function calls based on the malicious intent hidden within any packed program. This paper uses four-step methodology for developing a fully automated system to arrive at six main categories of suspicious behavior of API call features. © 2010 IEEE.
- Authors: Alazab, Mamoun , Venkatraman, Sitalakshmi , Watters, Paul
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: One of the recent trends adopted by malware authors is to use packers or software tools that instigate code obfuscation in order to evade detection by antivirus scanners. With evasion techniques such as polymorphism and metamorphism malware is able to fool current detection techniques. Thus, security researchers and the anti-virus industry are facing a herculean task in extracting payloads hidden within packed executables. It is a common practice to use manual unpacking or static unpacking using some software tools and analyse the application programming interface (API) calls for malware detection. However, extracting these features from the unpacked executables for reverse obfuscation is labour intensive and requires deep knowledge of low-level programming that includes kernel and assembly language. This paper presents an automated method of extracting API call features and analysing them in order to understand their use for malicious purpose. While some research has been conducted in arriving at file birthmarks using API call features and the like, there is a scarcity of work that relates to features in malcodes. To address this gap, we attempt to automatically analyse and classify the behavior of API function calls based on the malicious intent hidden within any packed program. This paper uses four-step methodology for developing a fully automated system to arrive at six main categories of suspicious behavior of API call features. © 2010 IEEE.
Windows rootkits: Attacks and countermeasures
- Lobo, Desmond, Watters, Paul, Wu, Xin, Sun, Li
- Authors: Lobo, Desmond , Watters, Paul , Wu, Xin , Sun, Li
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Windows XP is the dominant operating system in the world today and rootkits have been a major concern for XP users. This paper provides an in-depth analysis of the rootkits that target that operating system, while focusing on those that use various hooking techniques to hide malware on a machine. We identify some of the weaknesses in the Windows XP architecture that rootkits exploit and then evaluate some of the anti-rootkit security features that Microsoft has unveiled in Vista and 7. To reduce the number of rootkit infections in the future, we suggest that Microsoft should take full advantage of Intel's four distinct privilege levels. © 2010 IEEE.
- Authors: Lobo, Desmond , Watters, Paul , Wu, Xin , Sun, Li
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Windows XP is the dominant operating system in the world today and rootkits have been a major concern for XP users. This paper provides an in-depth analysis of the rootkits that target that operating system, while focusing on those that use various hooking techniques to hide malware on a machine. We identify some of the weaknesses in the Windows XP architecture that rootkits exploit and then evaluate some of the anti-rootkit security features that Microsoft has unveiled in Vista and 7. To reduce the number of rootkit infections in the future, we suggest that Microsoft should take full advantage of Intel's four distinct privilege levels. © 2010 IEEE.
- «
- ‹
- 1
- ›
- »