Spam email categorization with nlp and using federated deep learning
- Authors: Ul Haq, Ikram , Black, Paul , Gondal, Iqbal , Kamruzzaman, Joarder , Watters, Paul , Kayes, A.
- Date: 2022
- Type: Text , Conference paper
- Relation: 18th International Conference on Advanced Data Mining and Applications, ADMA 2022, Brisbane, Australia, 28-30 November 2022, Advanced Data Mining and Applications, 18th International Conference, ADMA 2022 Vol. 13726 LNAI, p. 15-27
- Full Text: false
- Reviewed:
- Description: Emails are the most popular and efficient communication method that makes them vulnerable to misuse. Federated learning (FL) provides a decentralized machine learning (ML) model, where a central server coordinates clients that collaboratively train a shared ML model. This paper proposes Federated Phishing Filtering (FPF) technique based on federated learning, natural language processing, and deep learning. FL for intelligent algorithms fuses trained models of ML algorithms from multiple sites for collective learning. This approach improves ML performance by utilizing large collective training data sets across the corporate client base, resulting in higher phishing email detection accuracy. FPF techniques preserve email privacy using local feature extraction on client email servers. Thus, the contents of emails do not need to be transmitted across the network or stored on third-party servers. We have applied FL and Natural Language Processing (NLP) for email phishing detection. This technique provides four training modes that perform FL without sharing email content. Our research categorizes emails as benign, spam, and phishing. Empirical evaluations with publicly available datasets show that accuracy is improved by the use of our Federated Deep Learning model. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
API based discrimination of ransomware and benign cryptographic programs
- Authors: Black, Paul , Sohail, Ammar , Gondal, Iqbal , Kamruzzaman, Joarder , Vamplew, Peter , Watters, Paul
- Date: 2020
- Type: Text , Conference paper
- Relation: 27th International Conference on Neural Information Processing, ICONIP 2020, Bangkok, 18 to 22 November 2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 12533 LNCS, p. 177-188
- Full Text: false
- Reviewed:
- Description: Ransomware is a widespread class of malware that encrypts files in a victim’s computer and extorts victims into paying a fee to regain access to their data. Previous research has proposed methods for ransomware detection using machine learning techniques. However, this research has not examined the precision of ransomware detection. While existing techniques show an overall high accuracy in detecting novel ransomware samples, previous research does not investigate the discrimination of novel ransomware from benign cryptographic programs. This is a critical, practical limitation of current research; machine learning based techniques would be limited in their practical benefit if they generated too many false positives (at best) or deleted/quarantined critical data (at worst). We examine the ability of machine learning techniques based on Application Programming Interface (API) profile features to discriminate novel ransomware from benign-cryptographic programs. This research provides a ransomware detection technique that provides improved detection accuracy and precision compared to other API profile based ransomware detection techniques while using significantly simpler features than previous dynamic ransomware detection research. © 2020, Springer Nature Switzerland AG.
ICANN or ICANT: Is WHOIS an Enabler of Cybercrime?
- Authors: Watters, Paul , Herps, Aaron , Layton, Robert , McCombie, Stephen
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 44-49
- Full Text: false
- Reviewed:
- Description: WHOIS acts as a registry for organisations or individuals who 'own' or take responsibility for domains. For any registry to be functional, its integrity needs to be assured. Unfortunately, WHOIS data does not appear to meet basic integrity requirements in many cases, reducing the effectiveness of law enforcement and rightsholders in requesting takedowns for phishing kits, zombie hosts that are part of a botnet, or infringing content. In this paper, we illustrate the problem using a case study from trademark protection, where investigators attempt to trace fake goods being advertised on Facebook. The results indicate that ICANN needs to at least introduce minimum verification standards for WHOIS records vis-Ã -vis integrity, and optimally, develop a system for rapid takedowns in the event that a domain is being misused.
Identifying Faked Hotel Reviews Using Authorship Analysis
- Authors: Layton, Robert , Watters, Paul , Ureche, Oana
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 1-6
- Full Text: false
- Reviewed:
- Description: The use of online review sites has grown significantly, allowing for communities to share information on products or services.These online review sites are marketed as being independent and trustworthy, but have been criticised for not ensuring the integrity of the reviews.One major concern is that of review fraud; where a person (such as a marketer) is paid to write favourable reviews for one product or poor reviews for a competitor.In this research we show a method for determining if two reviews share an author, which can be used to identify if a review is legitimate.Our results indicate a high quality of the method, with an f-1-score of over 0.66 in testing data with 40 authors, with most authors having only one or two documents.This type of analysis can be used to investigate cases of potential hotel review fraud.
Indirect information linkage for OSINT through authorship analysis of aliases
- Authors: Layton, Robert , Perez, Charles , Birregah, Babiga , Watters, Paul , Lemercier, Marc
- Date: 2013
- Type: Text , Conference paper
- Relation: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining Vol. 7867 LNAI, p. 36-46
- Full Text: false
- Reviewed:
- Description: In this paper we examine the problem of automatically linking online accounts for open source intelligence gathering. We specifically aim to determine if two social media accounts are shared by the same author, without the use of direct linking evidence. We profile the accounts using authorship analysis and find the best matching guess. We apply this to a series of Twitter accounts identified as malicious by a methodology named SPOT and find several pairs of accounts that belong to the same author, despite no direct evidence linking the two. Overall, our results show that linking aliases is possible with an accuracy of 84%, and using our automated threshold method improves our accuracy to over 90% by removing incorrectly discovered matches. © Springer-Verlag 2013.
Measuring Surveillance in Online Advertising: A Big Data Approach
- Authors: Herps, Aaron , Watters, Paul , Pineda-Villavicencio, Guillermo
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 30-35
- Full Text: false
- Reviewed:
- Description: There is an increasing public and policy awareness that tracking cookies are being used to support behavioral advertising, but the extent to which tracking is occurring is not clear. The extent of tracking could have implications for the enforceability of legislative responses to the sharing of personal data, including the Privacy Act 1988 (Cth). In this paper, we develop a methodology for determining the prevalence of tracking cookies, and report the results for a sample of the 50 most visited sites by Australians. We find that the use of tracking cookies is endemic, but that distinct clusters of tracking can be identified across categories including search, pornography and social networking. The implications of the work in relation to privacy are discussed.
REPLOT: REtrieving profile links on Twitter for suspicious networks detection
- Authors: Perez, Charles , Birregah, Babiga , Layton, Robert , Lemercier, Marc , Watters, Paul
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013 p. 1307-1314
- Full Text: false
- Reviewed:
- Description: In the last few decades social networking sites have encountered their first large-scale security issues. The high number of users associated with the presence of sensitive data (personal or professional) is certainly an unprecedented opportunity for malicious activities. As a result, one observes that malicious users are progressively turning their attention from traditional e-mail to online social networks to carry out their attacks. Moreover, it is now observed that attacks are not only performed by individual profiles, but that on a larger scale, a set of profiles can act in coordination in making such attacks. The latter are referred to as malicious social campaigns. In this paper, we present a novel approach that combines authorship attribution techniques with a behavioural analysis for detecting and characterizing social campaigns. The proposed approach is performed in three steps: first, suspicious profiles are identified from a behavioural analysis; second, connections between suspicious profiles are retrieved using a combination of authorship attribution and temporal similarity; third, a clustering algorithm is performed to identify and characterise the suspicious campaigns obtained. We provide a real-life application of the methodology on a sample of 1,000 suspicious Twitter profiles tracked over a period of forty days. Our results show that a large set of suspicious profiles behaves in coordination (70%) and propagates mainly, but not only, trustworthy URLs on the online social network. Among the three largest detected campaigns, we have highlighted that one represents an important security issue for the platform by promoting a significant set of malicious URLs. Copyright 2013 ACM.
Skype Traffic Classification Using Cost Sensitive Algorithms
- Authors: Azab, Azab , Layton, Robert , Alazab, Mamoun , Watters, Paul
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 14-21
- Full Text: false
- Reviewed:
- Description: Voice over IP (VoIP) technologies such as Skype are becoming increasingly popular and widely used in different organisations, and therefore identifying the usage of this service at the network level becomes very important. Reasons for this include applying Quality of Service (QoS), network planning, prohibiting its use in some networks and lawful interception of communications. Researchers have addressed VoIP traffic classification from different viewpoints, such as classifier accuracy, building time, classification time and online classification. This previous research tested their models using the same version of a VoIP product they used for training the model, giving generalizability only to that version of the product. This means that as new VoIP versions are released, these classifiers become obsolete. In this paper, we address if this approach is applicable to detecting new, untrained, versions of Skype. We suggest that using cost-sensitive classifiers can help to improve the accuracy of detecting untrained versions, by testing compared to other algorithms. Our experiment demonstrates promising preliminary results to detect Skype version 4, by building a cost sensitive classifier on Skype version 3, achieving an F-measure score of 0.57. This is a drastic improvement from not using cost sensitivity, which scores an F-measure of 0. This approach may be enhanced to improve the detection results and extended to improve detection for other applications that change protocols from version to version.
Detecting illicit drugs on social media using Automated Social Media Intelligence Analysis (ASMIA)
- Authors: Watters, Paul , Phair, Nigel
- Date: 2012
- Type: Text , Conference paper
- Relation: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 7672 LNCS, p. 66-76
- Full Text:
- Reviewed:
- Description: While social media is a new and exciting technology, it has the potential to be misused by organized crime groups and individuals involved in the illicit drugs trade. In particular, social media provides a means to create new marketing and distribution opportunities to a global marketplace, often exploiting jurisdictional gaps between buyer and seller. The sheer volume of postings presents investigational barriers, but the platform is amenable to the partial automation of open source intelligence. This paper presents a new methodology for automating social media data, and presents two pilot studies into its use for detecting marketing and distribution of illicit drugs targeted at Australians. Key technical challenges are identified, and the policy implications of the ease of access to illicit drugs are discussed. © 2012 Springer-Verlag.
- Description: 2003010676
Cybercrime : The case of obfuscated malware
- Authors: Alazab, Mamoun , Venkatraman, Sitalakshmi , Watters, Paul , Alazab, Moutaz , Alazab, Ammar
- Date: 2011
- Type: Text , Conference paper
- Relation: Joint 7th International Conference on Global Security, Safety and Sustainability, ICGS3 2011, and the 4th Conference on e-Democracy Vol. 99 LNICST, p. 204-211
- Full Text: false
- Reviewed:
- Description: Cybercrime has rapidly developed in recent years and malware is one of the major security threats in computer which have been in existence from the very early days. There is a lack of understanding of such malware threats and what mechanisms can be used in implementing security prevention as well as to detect the threat. The main contribution of this paper is a step towards addressing this by investigating the different techniques adopted by obfuscated malware as they are growingly widespread and increasingly sophisticated with zero-day exploits. In particular, by adopting certain effective detection methods our investigations show how cybercriminals make use of file system vulnerabilities to inject hidden malware into the system. The paper also describes the recent trends of Zeus botnets and the importance of anomaly detection to be employed in addressing the new Zeus generation of malware. © 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering.
- Description: 2003010650
A new procedure to help system/network administrators identify multiple rootkit infections
- Authors: Lobo, Desmond , Watters, Paul , Wu, Xinwen
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 2nd International Conference on Communication Software and Networks, ICCSN 2010, Singapore : 26th-28th February 2010 p. 124-128
- Full Text:
- Description: Rootkits refer to software that is used to hide the presence of malware from system/network administrators and permit an attacker to take control of a computer. In our previous work, we designed a system that would categorize rootkits based on the hooks that had been created. Focusing on rootkits that use inline function hooking techniques, we showed that our system could successfully categorize a sample of rootkits using unsupervised EM clustering. In this paper, we extend our previous work by outlining a new procedure to help system/network administrators identify the rootkits that have infected their machines. Using a logistic regression model for profiling families of rootkits, we were able to identify at least one of the rootkits that had infected each of the systems that we tested. © 2010 IEEE.
- Description: Rootkits refer to software that is used to hide the presence of malware from system/network administrators and permit an attacker to take control of a computer. In our previous work, we designed a system that would categorize rootkits based on the hooks that had been created. Focusing on rootkits that use inline function hooking techniques, we showed that our system could successfully categorize a sample of rootkits using unsupervised EM clustering. In this paper, we extend our previous work by outlining a new procedure to help system/network administrators identify the rootkits that have infected their machines. Using a logistic regression model for profiling families of rootkits, we were able to identify at least one of the rootkits that had infected each of the systems that we tested. © 2010 IEEE.
Authorship attribution for Twitter in 140 characters or less
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at - 2nd Cybercrime and Trustworthy Computing Workshop, CTC 2010 p. 1-8
- Full Text:
- Reviewed:
- Description: Authorship attribution is a growing field, moving from beginnings in linguistics to recent advances in text mining. Through this change came an increase in the capability of authorship attribution methods both in their accuracy and the ability to consider more difficult problems. Research into authorship attribution in the 19th century considered it difficult to determine the authorship of a document of fewer than 1000 words. By the 1990s this values had decreased to less than 500 words and in the early 21 st century it was considered possible to determine the authorship of a document in 250 words. The need for this ever decreasing limit is exemplified by the trend towards many shorter communications rather than fewer longer communications, such as the move from traditional multi-page handwritten letters to shorter, more focused emails. This trend has also been shown in online crime, where many attacks such as phishing or bullying are performed using very concise language. Cybercrime messages have long been hosted on Internet Relay Chats (IRCs) which have allowed members to hide behind screen names and connect anonymously. More recently, Twitter and other short message based web services have been used as a hosting ground for online crimes. This paper presents some evaluations of current techniques and identifies some new preprocessing methods that can be used to enable authorship to be determined at rates significantly better than chance for documents of 140 characters or less, a format popularised by the micro-blogging website Twitter1. We show that the SCAP methodology performs extremely well on twitter messages and even with restrictions on the types of information allowed, such as the recipient of directed messages, still perform significantly higher than chance. Further to this, we show that 120 tweets per user is an important threshold, at which point adding more tweets per user gives a small but non-significant increase in accuracy. © 2010 IEEE.
Automatically determining phishing campaigns using the USCAP methodology
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at General Members Meeting and eCrime Researchers Summit, eCrime 2010 p. 1-8
- Full Text:
- Reviewed:
- Description: Phishing fraudsters attempt to create an environment which looks and feels like a legitimate institution, while at the same time attempting to bypass filters and suspicions of their targets. This is a difficult compromise for the phishers and presents a weakness in the process of conducting this fraud. In this research, a methodology is presented that looks at the differences that occur between phishing websites from an authorship analysis perspective and is able to determine different phishing campaigns undertaken by phishing groups. The methodology is named USCAP, for Unsupervised SCAP, which builds on the SCAP methodology from supervised authorship and extends it for unsupervised learning problems. The phishing website source code is examined to generate a model that gives the size and scope of each of the recognized phishing campaigns. The USCAP methodology introduces the first time that phishing websites have been clustered by campaign in an automatic and reliable way, compared to previous methods which relied on costly expert analysis of phishing websites. Evaluation of these clusters indicates that each cluster is strongly consistent with a high stability and reliability when analyzed using new information about the attacks, such as the dates that the attack occurred on. The clusters found are indicative of different phishing campaigns, presenting a step towards an automated phishing authorship analysis methodology. © 2010 IEEE.
Descriptive data mining on fraudulent online dating profiles
- Authors: Pan, Jinjian A. , Winchester, Donald , Land, Lesley , Watters, Paul
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of ECIS 2010, The 18th European Conference on Information Systems 2010 p. 1-11
- Full Text: false
- Reviewed:
- Description: The increasing ease of access to the World Wide Web and email harvesting tools has enabled spammers to target a wider audience. The problem is where scams are widely encountered in day to day environmental to individuals from all walks of life and result in millions of dollars in financial loss as well as emotional trauma (Newman 2005). This paper aims to analyse and examine the structure of Romance Fraud, in a bid to understand and detect Romance Fraud profiles. We focus on scams that utilise the medium of dating websites. The primary indicators of Romance Fraud identified in the literature include social factors, scam characteristics and content....
Identifying rootkit infections using data mining
- Authors: Lobo, Desmond , Watters, Paul , Wu, Xinwen
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 2010 International Conference on Information Science and Applications, ICISA 2010, Seoul, Korea : p. 1-7
- Full Text:
- Description: Rootkits refer to software that is used to hide the presence and activity of malware and permit an attacker to take control of a computer system. In our previous work, we focused strictly on identifying rootkits that use inline function hooking techniques to remain hidden. In this paper, we extend our previous work by including rootkits that use other types of hooking techniques, such as those that hook the IATs (Import Address Tables) and SSDTs (System Service Descriptor Tables). Unlike other malware identification techniques, our approach involved conducting dynamic analyses of various rootkits and then determining the family of each rootkit based on the hooks that had been created on the system. We demonstrated the effectiveness of this approach by first using the CLOPE (Clustering with sLOPE) algorithm to cluster a sample of rootkits into several families; next, the ID3 (Iterative Dichotomiser 3) algorithm was utilized to generate a decision tree for identifying the rootkit that had infected a machine. ©2010 IEEE.
RBACS : Rootkit behavioral analysis and classification system
- Authors: Lobo, Desmond , Watters, Paul , Wu, Xinwen
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 3rd International Conference on Knowledge Discovery and Data Mining, WKDD 2010, Phuket : 9th-10th January 2010 p. 75-80
- Full Text:
- Description: In this paper, we focus on rootkits, a special type of malicious software (malware) that operates in an obfuscated and stealthy mode to evade detection. Categorizing these rootkits will help in detecting future attacks against the business community. We first developed a theoretical framework for classifying rootkits. Based on our theoretical framework, we then proposed a new rootkit classification system and tested our system on a sample of rootkits that use inline function hooking. Our experimental results showed that our system could successfully categorize the sample using unsupervised clustering. © 2010 IEEE.
A preliminary profiling of internet money mules : An Australian perspective
- Authors: Aston, Manny , McCombie, Stephen , Reardon, Ben , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing, UIC-ATC '09, Brisbane, Queensland : 7th-9th July 2009 p. 482-487
- Full Text:
- Description: Along with the massive growth in Internet commerce over the last ten years there has been a corresponding boom in Internet related crime, or cybercrime. According to research recently released by the Australian Bureau of Statistics in 2006 57,000 Australians aged 15 years and over fell victim to phishing and related Internet scams. Of all the victims of cybercrime, only one group is potentially subject to criminal prosecution: `Internet money mules'-those who, either knowingly or unknowingly, launder money. This paper examines the demographic profile-specifically age, gender and postcode-related to 660 confirmed money mule incidents recorded during the calendar year 2007, for a major Australian financial institution. This data is compared to ABS statistics of Internet usage in 2006. There is clear evidence of a strong gender bias towards males, particularly in the older age group. This is directly relevant when considering education and training programs for both corporations and the community on the issues surrounding Internet money mule scams and in ultimately understanding the problem of Internet banking fraud.
- Description: 2003007858
Automatically generating classifier for phishing email prediction
- Authors: Ma, Liping , Torney, Rosemary , Watters, Paul , Brown, Simon
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at I-SPAN 2009 - The 10th International Symposium on Pervasive Systems, Algorithms, and Networks, Kaohsiung, Taiwan : 14th-16th December 2009 p. 779-783
- Full Text:
- Description: Phishing is a form of online identity theft that employs both social engineering and technical subterfuge to steal consumers' personal identity data and financial account credentials. Phishing email prediction has drawn a lot of attention from many researchers. According to current anti-phishing research, a classifier generated by decision tree produces the most accurate predictions. However, there appears not to be any open source available to transfer such a decision to an implementable classifier. The work presented in this paper builds a decision tree parser which automatically translates a decision tree into an implementable program language so that the decision is useful in real world applications. Experiment results show that the parser performs as well as the original decision. © 2009 IEEE.
- Description: 2003007989
Autonomic nervous system factors underlying anxiety in virtual environments : A regression model for cybersickness
- Authors: Bruck, Susan , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at VSMM 2009 -15th International Conference on Virtual Systems and Multimedia, Vienna : 9th-12th September 2009 p. 67-72
- Full Text:
- Description: The ability to predict whether people will experience anxiety is important for recruitment and selection in highly-stressful professions. Using a Virtual Reality Environment (VRE) can provide a tool to predict whether a person will experience anxiety. This paper reports several regression models which suggest observed and self-reported measures of anxiety during and after immersion in a VRE can be used to predict an individual's anxiety response to a simulated stressful environment. We found that respiration was a poor predictor of anxiety, but that cardiac activity accounted for around 39% of variance in self-reported anxiety responses using a four point scale. In contrast, responses from the Simulator Sickness Questionnaire (SSQ) accounted for 98% of variance in anxiety responses. However, only four out of eighteen measures in the SSQ made a significant contribution to the model. The implication for predicting an individual's anxiety responses using self-report or physiological measures is discussed. © 2009 IEEE.
Cybercrime attribution : An Eastern European case study
- Authors: McCombie, Stephen , Pieprzyk, Josef , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 7th Australian Digital Forensics Conference, Perth, Western Australia 1st-3rd December 2009 p. 41-51
- Full Text: false
- Description: Phishing and related cybercrime is responsible for billions of dollars in losses annually. Gartner reported more than 5 million U.S. consumers lost money to phishing attacks in the 12 months ending in September 2008 (Gartner 2009). This paper asks whether the majority of organised phishing and related cybercrime originates in Eastern Europe rather than elsewhere such as China or the USA. The Russian “Mafiya” in particular has been popularised by the media and entertainment industries to the point where it can be hard to separate fact from fiction but we have endeavoured to look critically at the information available on this area to produce a survey. We take a particular focus on cybercrime from an Australian perspective, as Australia was one of the first places where Phishing attacks against Internet banks were seen. It is suspected these attacks came from Ukrainian spammers. The survey is built from case studies both where individuals from Eastern Europe have been charged with related crimes or unsolved cases where there is some nexus to Eastern Europe. It also uses some earlier work done looking at those early Phishing attacks, archival analysis of Phishing attacks in July 2006 and new work looking at correlation between the Corruption Perception Index, Internet penetration and tertiary education in Russia and the Ukraine. The value of this work is to inform and educate those charged with responding to cybercrime where a large part of the problem originates and try to understand why.
- Description: 2003007921