Spam email categorization with nlp and using federated deep learning
- Authors: Ul Haq, Ikram , Black, Paul , Gondal, Iqbal , Kamruzzaman, Joarder , Watters, Paul , Kayes, A.
- Date: 2022
- Type: Text , Conference paper
- Relation: 18th International Conference on Advanced Data Mining and Applications, ADMA 2022, Brisbane, Australia, 28-30 November 2022, Advanced Data Mining and Applications, 18th International Conference, ADMA 2022 Vol. 13726 LNAI, p. 15-27
- Full Text: false
- Reviewed:
- Description: Emails are the most popular and efficient communication method that makes them vulnerable to misuse. Federated learning (FL) provides a decentralized machine learning (ML) model, where a central server coordinates clients that collaboratively train a shared ML model. This paper proposes Federated Phishing Filtering (FPF) technique based on federated learning, natural language processing, and deep learning. FL for intelligent algorithms fuses trained models of ML algorithms from multiple sites for collective learning. This approach improves ML performance by utilizing large collective training data sets across the corporate client base, resulting in higher phishing email detection accuracy. FPF techniques preserve email privacy using local feature extraction on client email servers. Thus, the contents of emails do not need to be transmitted across the network or stored on third-party servers. We have applied FL and Natural Language Processing (NLP) for email phishing detection. This technique provides four training modes that perform FL without sharing email content. Our research categorizes emails as benign, spam, and phishing. Empirical evaluations with publicly available datasets show that accuracy is improved by the use of our Federated Deep Learning model. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Security and blockchain convergence with internet of multimedia things : current trends, research challenges and future directions
- Authors: Jan, Mian , Cai, Jinjin , Gao, Xiang-Chuan , Khan, Fazlullah , Mastorakis, Spyridon , Usman, Muhammad , Alazab, Mamoun , Watters, Paul
- Date: 2021
- Type: Text , Journal article
- Relation: Journal of Network and Computer Applications Vol. 175, no. (2021), p.
- Full Text:
- Reviewed:
- Description: The Internet of Multimedia Things (IoMT) orchestration enables the integration of systems, software, cloud, and smart sensors into a single platform. The IoMT deals with scalar as well as multimedia data. In these networks, sensor-embedded devices and their data face numerous challenges when it comes to security. In this paper, a comprehensive review of the existing literature for IoMT is presented in the context of security and blockchain. The latest literature on all three aspects of security, i.e., authentication, privacy, and trust is provided to explore the challenges experienced by multimedia data. The convergence of blockchain and IoMT along with multimedia-enabled blockchain platforms are discussed for emerging applications. To highlight the significance of this survey, large-scale commercial projects focused on security and blockchain for multimedia applications are reviewed. The shortcomings of these projects are explored and suggestions for further improvement are provided. Based on the aforementioned discussion, we present our own case study for healthcare industry: a theoretical framework having security and blockchain as key enablers. The case study reflects the importance of security and blockchain in multimedia applications of healthcare sector. Finally, we discuss the convergence of emerging technologies with security, blockchain and IoMT to visualize the future of tomorrow's applications. © 2020 Elsevier Ltd
API based discrimination of ransomware and benign cryptographic programs
- Authors: Black, Paul , Sohail, Ammar , Gondal, Iqbal , Kamruzzaman, Joarder , Vamplew, Peter , Watters, Paul
- Date: 2020
- Type: Text , Conference paper
- Relation: 27th International Conference on Neural Information Processing, ICONIP 2020, Bangkok, 18 to 22 November 2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 12533 LNCS, p. 177-188
- Full Text: false
- Reviewed:
- Description: Ransomware is a widespread class of malware that encrypts files in a victim’s computer and extorts victims into paying a fee to regain access to their data. Previous research has proposed methods for ransomware detection using machine learning techniques. However, this research has not examined the precision of ransomware detection. While existing techniques show an overall high accuracy in detecting novel ransomware samples, previous research does not investigate the discrimination of novel ransomware from benign cryptographic programs. This is a critical, practical limitation of current research; machine learning based techniques would be limited in their practical benefit if they generated too many false positives (at best) or deleted/quarantined critical data (at worst). We examine the ability of machine learning techniques based on Application Programming Interface (API) profile features to discriminate novel ransomware from benign-cryptographic programs. This research provides a ransomware detection technique that provides improved detection accuracy and precision compared to other API profile based ransomware detection techniques while using significantly simpler features than previous dynamic ransomware detection research. © 2020, Springer Nature Switzerland AG.
Rapid anomaly detection using integrated prudence analysis (IPA)
- Authors: Maruatona, Omaru , Vamplew, Peter , Dazeley, Richard , Watters, Paul
- Date: 2018
- Type: Text , Conference proceedings
- Relation: PAKDD 2018.Trends and Applications in Knowledge Discovery and Data Mining. p. 137-141
- Full Text: false
- Reviewed:
- Description: Integrated Prudence Analysis has been proposed as a method to maximize the accuracy of rule based systems. The paper presents evaluation results of the three Prudence methods on public datasets which demonstrate that combining attribute-based and structural Prudence produces a net improvement in Prudence Accuracy.
Automating Open Source Intelligence: Algorithms for OSINT
- Authors: Layton, Robert , Watters, Paul
- Date: 2015
- Type: Text , Book
- Full Text: false
- Reviewed:
- Description: Algorithms for Automating Open Source Intelligence (OSINT) presents information on the gathering of information and extraction of actionable intelligence from openly available sources, including news broadcasts, public repositories, and more recently, social media. As OSINT has applications in crime fighting, state-based intelligence, and social research, this book provides recent advances in text mining, web crawling, and other algorithms that have led to advances in methods that can largely automate this process.
REPLOT : REtrieving Profile Links on Twitter for malicious campaign discovery
- Authors: Perez, Charles , Birregah, Babiga , Layton, Robert , Lemercier, Marc , Watters, Paul
- Date: 2015
- Type: Text , Journal article
- Relation: AI Communications Vol. 29, no. 1 (2015), p. 107-122
- Full Text:
- Reviewed:
- Description: Social networking sites are increasingly subject to malicious activities such as self-propagating worms, confidence scams and drive-by-download malwares. The high number of users associated with the presence of sensitive data, such as personal or professional information, is certainly an unprecedented opportunity for attackers. These attackers are moving away from previous platforms of attack, such as emails, towards social networking websites. In this paper, we present a full stack methodology for the identification of campaigns of malicious profiles on social networking sites, composed of maliciousness classification, campaign discovery and attack profiling. The methodology named REPLOT, for REtrieving Profile Links On Twitter, contains three major phases. First, profiles are analysed to determine whether they are more likely to be malicious or benign. Second, connections between suspected malicious profiles are retrieved using a late data fusion approach consisting of temporal and authorship analysis based models to discover campaigns. Third, the analysis of the discovered campaigns is performed to investigate the attacks. In this paper, we apply this methodology to a real world dataset, with a view to understanding the links between malicious profiles, their attack methods and their connections. Our analysis identifies a cluster of linked profiles focusing on propagating malicious links, as well as profiling two other major clusters of attacking campaigns. © 2016 - IOS Press and the authors. All rights reserved.
Authorship analysis of aliases: Does topic influence accuracy?
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Natural Language Engineering Vol. Online first, no. (2013), p.
- Full Text:
- Reviewed:
- Description: Aliases play an important role in online environments by facilitating anonymity, but also can be used to hide the identity of cybercriminals. Previous studies have investigated this alias matching problem in an attempt to identify whether two aliases are shared by an author, which can assist with identifying users. Those studies create their training data by randomly splitting the documents associated with an alias into two sub-aliases. Models have been built that can regularly achieve over 90% accuracy for recovering the linkage between these ‘random sub-aliases’. In this paper, random sub-alias generation is shown to enable these high accuracies, and thus does not adequately model the real-world problem. In contrast, creating sub-aliases using topic-based splitting drastically reduces the accuracy of all authorship methods tested. We then present a methodology that can be performed on non-topic controlled datasets, to produce topic-based sub-aliases that are more difficult to match. Finally, we present an experimental comparison between many authorship methods to see which methods better match aliases under these conditions, finding that local n-gram methods perform better than others.
Automated unsupervised authorship analysis using evidence accumulation clustering
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Natural Language Engineering Vol. 19, no. 1 (2013), p. 95-120
- Full Text:
- Reviewed:
- Description: Authorship Analysis aims to extract information about the authorship of documents from features within those documents. Typically, this is performed as a classification task with the aim of identifying the author of a document, given a set of documents of known authorship. Alternatively, unsupervised methods have been developed primarily as visualisation tools to assist the manual discovery of clusters of authorship within a corpus by analysts. However, there is a need in many fields for more sophisticated unsupervised methods to automate the discovery, profiling and organisation of related information through clustering of documents by authorship. An automated and unsupervised methodology for clustering documents by authorship is proposed in this paper. The methodology is named NUANCE, for n-gram Unsupervised Automated Natural Cluster Ensemble. Testing indicates that the derived clusters have a strong correlation to the true authorship of unseen documents. © 2011 Cambridge University Press.
- Description: 2003010584
Evaluating authorship distance methods using the positive Silhouette coefficient
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Natural Language Engineering Vol. 19, no. 4 (2013), p. 517-535
- Full Text:
- Reviewed:
- Description: Unsupervised Authorship Analysis (UAA) aims to cluster documents by authorship without knowing the authorship of any documents. An important factor in UAA is the method for calculating the distance between documents. This choice of the authorship distance method is considered more critical to the end result than the choice of cluster analysis algorithm. One method for measuring the correlation between a distance metric and a labelling (such as class values or clusters) is the Silhouette Coefficient (SC). The SC can be leveraged by measuring the correlation between the authorship distance method and the true authorship, evaluating the quality of the distance method. However, we show that the SC can be severely affected by outliers. To address this issue, we introduce the Positive Silhouette Coefficient, given as the proportion of instances with a positive SC value. This metric is not easily altered by outliers and produces a more robust metric. A large number of authorship distance methods are then compared using the PSC, and the findings are presented. This research provides an insight into the efficacy of methods for UAA and presents a framework for testing authorship distance methods.
- Description: C1
ICANN or ICANT: Is WHOIS an Enabler of Cybercrime?
- Authors: Watters, Paul , Herps, Aaron , Layton, Robert , McCombie, Stephen
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 44-49
- Full Text: false
- Reviewed:
- Description: WHOIS acts as a registry for organisations or individuals who 'own' or take responsibility for domains. For any registry to be functional, its integrity needs to be assured. Unfortunately, WHOIS data does not appear to meet basic integrity requirements in many cases, reducing the effectiveness of law enforcement and rightsholders in requesting takedowns for phishing kits, zombie hosts that are part of a botnet, or infringing content. In this paper, we illustrate the problem using a case study from trademark protection, where investigators attempt to trace fake goods being advertised on Facebook. The results indicate that ICANN needs to at least introduce minimum verification standards for WHOIS records vis-Ã -vis integrity, and optimally, develop a system for rapid takedowns in the event that a domain is being misused.
Identifying Faked Hotel Reviews Using Authorship Analysis
- Authors: Layton, Robert , Watters, Paul , Ureche, Oana
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 1-6
- Full Text: false
- Reviewed:
- Description: The use of online review sites has grown significantly, allowing for communities to share information on products or services.These online review sites are marketed as being independent and trustworthy, but have been criticised for not ensuring the integrity of the reviews.One major concern is that of review fraud; where a person (such as a marketer) is paid to write favourable reviews for one product or poor reviews for a competitor.In this research we show a method for determining if two reviews share an author, which can be used to identify if a review is legitimate.Our results indicate a high quality of the method, with an f-1-score of over 0.66 in testing data with 40 authors, with most authors having only one or two documents.This type of analysis can be used to investigate cases of potential hotel review fraud.
Illicit image detection : An MRF model based stochastic approach
- Authors: Islam, Mofakharul , Watters, Paul , Yearwood, John , Hussain, Mazher , Swarna, Lubaba
- Date: 2013
- Type: Text , Book chapter
- Relation: Innovations and Advances in Computer, Information, Systems Sciences, and Engineering p. 467-479
- Full Text:
- Reviewed:
- Description: The steady growth of the Internet, sophisticated digital image processing technology, the cheap availability of storage devices and surfer's ever-increasing interest on images have been contributing to make the Internet an unprecedented large image library. As a result, The Internet quickly became the principal medium for the distribution of pornographic content favouring pornography to become a drug of the millennium. With the arrival of GPRS mobile telephone technology, and with the large scale arrival of the 3G networks, along with the cheap availability of latest mobile sets and a variety of forms of wireless connections, the internet has already gone to mobile, driving us toward a new degree of complexity. In this paper, we propose a stochastic model based novel approach to investigate and implement a pornography detection technique towards a framework for automated detection of pornography based on contextual constraints that are representatives of actual pornographic activity. Compared to the results published in recent works, our proposed approach yields the highest accuracy in detection. © 2013 Springer Science+Business Media.
Illicit image detection using erotic pose estimation based on kinematic constraints
- Authors: Islam, Mofakharul , Watters, Paul , Yearwood, John , Hussain, Mazher , Swarna, Lubaba
- Date: 2013
- Type: Text , Book chapter
- Relation: Innovations and Advances in Computer, Information, Systems Sciences, and Engineering p. 481-495
- Full Text:
- Reviewed:
- Description: With the advent of the Internet along with sophisticated digital image processing technology, the Internet quickly became the principal medium for the distribution of pornographic content favouring pornography to become a drug of the millennium. With the advent of GPRS mobile telephone networks, and with the large scale arrival of the 3G networks, along with the cheap availability of latest mobile sets and a variety of forms of wireless connections, the internet has already gone to mobile, drives us toward a new degree of complexity. The detection of pornography remains an important and significant research problem, since there is great potential to minimize harm to the community. In this paper, we propose a novel approach to investigate and implement a pornography detection technique towards a framework for automated detection of pornography based on most commonly found erotic poses. Compared to the results published in recent works, our proposed approach yields the highest accuracy in recognition. © 2013 Springer Science+Business Media.
Indirect information linkage for OSINT through authorship analysis of aliases
- Authors: Layton, Robert , Perez, Charles , Birregah, Babiga , Watters, Paul , Lemercier, Marc
- Date: 2013
- Type: Text , Conference paper
- Relation: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining Vol. 7867 LNAI, p. 36-46
- Full Text: false
- Reviewed:
- Description: In this paper we examine the problem of automatically linking online accounts for open source intelligence gathering. We specifically aim to determine if two social media accounts are shared by the same author, without the use of direct linking evidence. We profile the accounts using authorship analysis and find the best matching guess. We apply this to a series of Twitter accounts identified as malicious by a methodology named SPOT and find several pairs of accounts that belong to the same author, despite no direct evidence linking the two. Overall, our results show that linking aliases is possible with an accuracy of 84%, and using our automated threshold method improves our accuracy to over 90% by removing incorrectly discovered matches. © Springer-Verlag 2013.
Information security governance: The art of detecting hidden malware
- Authors: Alazab, Mamoun , Venkatraman, Sitalakshmi , Watters, Paul
- Date: 2013
- Type: Text , Book chapter
- Relation: IT Security governance innovations: Theory and research p. 293-315
- Full Text: false
- Reviewed:
- Description: Detecting malicious software or malware is one of the major concerns in information security governance as malware authors pose a major challenge to digital forensics by using a variety of highly sophisticated stealth techniques to hide malicious code in computing systems, including smartphones. The current detection techniques are futile, as forensic analysis of infected devices is unable to identify all the hidden malware, thereby resulting in zero day attacks. This chapter takes a key step forward to address this issue and lays foundation for deeper investigations in digital forensics. The goal of this chapter is, firstly, to unearth the recent obfuscation strategies employed to hide malware. Secondly, this chapter proposes innovative techniques that are implemented as a fully-automated tool, and experimentally tested to exhaustively detect hidden malware that leverage on system vulnerabilities. Based on these research investigations, the chapter also arrives at an information security governance plan that would aid in addressing the current and future cybercrime situations.
Local n-grams for author identification: Notebook for PAN at CLEF 2013 C3 - CEUR Workshop Proceedings
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Conference proceedings
- Full Text:
- Description: Our approach to the author identification task uses existing authorship attribution methods using local n-grams (LNG) and performs a weighted ensemble. This approach came in third for this year's competition, using a relatively simple scheme of weights by training set accuracy. LNG models create profiles, consisting of a list of character n-grams that best represent a particular author's writing. The use of a weighted ensemble improved upon the accuracy of the method without reducing the speed of the algorithm; the submitted solution was not only near the top of the leaderboard in terms of accuracy, but it was also one of the faster algorithms submitted.
Measuring Surveillance in Online Advertising: A Big Data Approach
- Authors: Herps, Aaron , Watters, Paul , Pineda-Villavicencio, Guillermo
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 30-35
- Full Text: false
- Reviewed:
- Description: There is an increasing public and policy awareness that tracking cookies are being used to support behavioral advertising, but the extent to which tracking is occurring is not clear. The extent of tracking could have implications for the enforceability of legislative responses to the sharing of personal data, including the Privacy Act 1988 (Cth). In this paper, we develop a methodology for determining the prevalence of tracking cookies, and report the results for a sample of the 50 most visited sites by Australians. We find that the use of tracking cookies is endemic, but that distinct clusters of tracking can be identified across categories including search, pornography and social networking. The implications of the work in relation to privacy are discussed.
Patterns of ownership of child model sites : Profiling the profiteers and consumers of child exploitation material
- Authors: Watters, Paul , Lueg, Christopher , Spiranovic, Caroline , Prichard, Jeremy
- Date: 2013
- Type: Text , Journal article
- Relation: First Monday Vol. 18, no. 2 (2013), p.
- Full Text: false
- Reviewed:
- Description: Recent research has indicated that cybercrime thrives when a corrupt social, economic, and political environment emerges such that law enforcement impact is minimised and key elements of crime prevention are absent. In this paper, using a snowball methodology we analyse patterns of ownership of "child model" sites which generate profits from advertising and/or subscriptions. While the material may not be traditional "pornography" in content, it is arguably exploitative. An open question is how the material compares to "beauty pageant" and other highly stylised mainstream photography that depicts children in adult situations, and whether access to all such material should be restricted.
- Description: 2003010829
REPLOT: REtrieving profile links on Twitter for suspicious networks detection
- Authors: Perez, Charles , Birregah, Babiga , Layton, Robert , Lemercier, Marc , Watters, Paul
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013 p. 1307-1314
- Full Text: false
- Reviewed:
- Description: In the last few decades social networking sites have encountered their first large-scale security issues. The high number of users associated with the presence of sensitive data (personal or professional) is certainly an unprecedented opportunity for malicious activities. As a result, one observes that malicious users are progressively turning their attention from traditional e-mail to online social networks to carry out their attacks. Moreover, it is now observed that attacks are not only performed by individual profiles, but that on a larger scale, a set of profiles can act in coordination in making such attacks. The latter are referred to as malicious social campaigns. In this paper, we present a novel approach that combines authorship attribution techniques with a behavioural analysis for detecting and characterizing social campaigns. The proposed approach is performed in three steps: first, suspicious profiles are identified from a behavioural analysis; second, connections between suspicious profiles are retrieved using a combination of authorship attribution and temporal similarity; third, a clustering algorithm is performed to identify and characterise the suspicious campaigns obtained. We provide a real-life application of the methodology on a sample of 1,000 suspicious Twitter profiles tracked over a period of forty days. Our results show that a large set of suspicious profiles behaves in coordination (70%) and propagates mainly, but not only, trustworthy URLs on the online social network. Among the three largest detected campaigns, we have highlighted that one represents an important security issue for the platform by promoting a significant set of malicious URLs. Copyright 2013 ACM.
Skype Traffic Classification Using Cost Sensitive Algorithms
- Authors: Azab, Azab , Layton, Robert , Alazab, Mamoun , Watters, Paul
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 14-21
- Full Text: false
- Reviewed:
- Description: Voice over IP (VoIP) technologies such as Skype are becoming increasingly popular and widely used in different organisations, and therefore identifying the usage of this service at the network level becomes very important. Reasons for this include applying Quality of Service (QoS), network planning, prohibiting its use in some networks and lawful interception of communications. Researchers have addressed VoIP traffic classification from different viewpoints, such as classifier accuracy, building time, classification time and online classification. This previous research tested their models using the same version of a VoIP product they used for training the model, giving generalizability only to that version of the product. This means that as new VoIP versions are released, these classifiers become obsolete. In this paper, we address if this approach is applicable to detecting new, untrained, versions of Skype. We suggest that using cost-sensitive classifiers can help to improve the accuracy of detecting untrained versions, by testing compared to other algorithms. Our experiment demonstrates promising preliminary results to detect Skype version 4, by building a cost sensitive classifier on Skype version 3, achieving an F-measure score of 0.57. This is a drastic improvement from not using cost sensitivity, which scores an F-measure of 0. This approach may be enhanced to improve the detection results and extended to improve detection for other applications that change protocols from version to version.