AFES: An advanced forensic evidence system
- Authors: Black, Paul , Gondal, Iqbal , Brooks, Richard , Yu, Lu
- Date: 2021
- Type: Text , Conference proceedings
- Relation: 2021 IEEE 25th International Enterprise Distributed Object Computing Workshop (EDOCW), Gold Coast, Australia, 25-29th October, 2021 p. 67-74
- Full Text: false
- Reviewed:
- Description: News media often contain reports that raise doubt related to policing operations. We examine the question of how to improve policing integrity during the execution of search warrants and provide an outline for law enforcement search warrants and digital forensic analysis procedures. Existing techniques for improving the integrity of search warrants are reviewed, limitations are noted, and we propose an Advanced Forensic Evidence System (AFES) to address these limitations.AFES provides an immutable record and biometric authentication of the officers present during the execution of a search warrant, time and location, video recording, seizure record, contemporaneous notes, and photographs. AFES records digital evidence items, imaging details, evidence hashes, provides an access control system, and an immutable record of access to all stored items. AFES uses a permissioned distributed ledger prototype, called Scrybe, developed under NSF aegis, to ensure evidence seizure integrity. Scrybe is run as multiple blockchain instances at law enforcement, prosecution, judicial, and defence organisations to ensure that an immutable record is maintained.
Dynamically recommending repositories for health data : a machine learning model
- Authors: Uddin, Md Ashraf , Stranieri, Andrew , Gondal, Iqbal , Balasubramanian, Venki
- Date: 2020
- Type: Text , Conference proceedings
- Relation: 2020 Australasian Computer Science Week Multiconference, ACSW 2020
- Full Text: false
- Reviewed:
- Description: Recently, a wide range of digital health record repositories has emerged. These include Electronic Health record managed by the government, Electronic Medical Record (EMR) managed by healthcare providers, Personal Health Record (PHR) managed directly by the patient and new Blockchain-based systems mainly managed by technologies. Health record repositories differ from one another on the level of security, privacy, and quality of services (QoS) they provide. Health data stored in these repositories also varies from patient to patient in sensitivity, and significance depending on medical, personal preference, and other factors. Decisions regarding which digital record repository is most appropriate for the storage of each data item at every point in time are complex and nuanced. The challenges are exacerbated with health data continuously streamed from wearable sensors. In this paper, we propose a recommendation model for health data storage that can accommodate patient preferences and make storage decisions rapidly, in real-time, even with streamed data. The model maps health data to be stored in the repositories. The mapping between health data features and characteristics of each repository is learned using a machine learning-based classifier mediated through clinical rules. Evaluation results demonstrate the model's feasibility. © 2020 ACM.
- Description: E1
Partial undersampling of imbalanced data for cyber threats detection
- Authors: Moniruzzaman, Md , Bagirov, Adil , Gondal, Iqbal
- Date: 2020
- Type: Text , Conference proceedings , Conference paper
- Relation: 2020 Australasian Computer Science Week Multiconference, ACSW 2020
- Full Text:
- Reviewed:
- Description: Real-time detection of cyber threats is a challenging task in cyber security. With the advancement of technology and ease of access to the internet, more and more individuals and organizations are becoming the target for various cyber attacks such as malware, ransomware, spyware. The target of these attacks is to steal money or valuable information from the victims. Signature-based detection methods fail to keep up with the constantly evolving new threats. Machine learning based detection has drawn more attention of researchers due to its capability of detecting new and modified attacks based on previous attack's behaviour. The number of malicious activities in a certain domain is significantly low compared to the number of normal activities. Therefore, cyber threats detection data sets are imbalanced. In this paper, we proposed a partial undersampling method to deal with imbalanced data for detecting cyber threats. © 2020 ACM.
- Description: E1
A Decentralized Patient Agent Controlled Blockchain for Remote Patient Monitoring
- Authors: Uddin, Ashraf , Stranieri, Andrew , Gondal, Iqbal , Balasubramanian, Venki
- Date: 2019
- Type: Text , Conference proceedings
- Relation: 15th International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2019 Vol. 2019-October, p. 207-214
- Full Text: false
- Reviewed:
- Description: Blockchain emerging for healthcare provides a secure, decentralized and patient driven record management system. However, the storage of data generated from IoT devices in remote patient management applications requires a fast consensus mechanism. In this paper, we propose a lightweight consensus mechanism and a decentralized patient software agent to control a remote patient monitoring (RPM) system. The decentralized RPM architecture includes devices at three levels; 1) Body Area Sensor Network-medical sensors typically on or in patient's body transmitting data to a Smartphone, 2) Fog/Edge, and 3) Cloud. We propose that a Patient Agent(PA) software replicated on the Smartphone, Fog and Cloud servers processes medical data to ensure reliable, secure and private communication. Performance analysis has been conducted to demonstrate the feasibility of the proposed Blockchain leveraged, distributed Patient Agent controlled remote patient monitoring system. © 2019 IEEE.
- Description: E1
An efficient selective miner consensus protocol in blockchain oriented iot smart monitoring
- Authors: Uddin, Ashraf , Stranieri, Andrew , Gondal, Iqbal , Balasubramanian, Venki
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne; Australia; 13th-15th February 2019 Vol. 2019-February, p. 1135-1142
- Full Text:
- Reviewed:
- Description: Blockchains have been widely used in Internet of Things(IoT) applications including smart cities, smart home and smart governance to provide high levels of security and privacy. In this article, we advance a Blockchain based decentralized architecture for the storage of IoT data produced from smart home/cities. The architecture includes a secure communication protocol using a sign-encryption technique between power constrained IoT devices and a Gateway. The sign encryption also preserves privacy. We propose that a Software Agent executing on the Gateway selects a Miner node using performance parameters of Miners. Simulations demonstrate that the recommended Miner selection outperforms Proof of Works selection used in Bitcoin and Random Miner Selection.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
Blockchain leveraged task migration in body area sensor networks
- Authors: Uddin, Ashraf , Stranieri, Andrew , Gondal, Iqbal , Balasubramanian, Venki
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 25th Asia-Pacific Conference on Communications, APCC 2019 p. 177-184
- Full Text:
- Reviewed:
- Description: Blockchain technologies emerging for healthcare support secure health data sharing with greater interoperability among different heterogeneous systems. However, the collection and storage of data generated from Body Area Sensor Net-works(BASN) for migration to high processing power computing services requires an efficient BASN architecture. We present a decentralized BASN architecture that involves devices at three levels; 1) Body Area Sensor Network-medical sensors typically on or in patient's body transmitting data to a Smartphone, 2) Fog/Edge, and 3) Cloud. We propose that a Patient Agent(PA) replicated on the Smartphone, Fog and Cloud servers processes medical data and execute a task offloading algorithm by leveraging a Blockchain. Performance analysis is conducted to demonstrate the feasibility of the proposed Blockchain leveraged, distributed Patient Agent controlled BASN. © 2019 IEEE.
- Description: E1
Categorical features transformation with compact one-hot encoder for fraud detection in distributed environment
- Authors: Ul Haq, Ikram , Gondal, Iqbal , Vamplew, Peter , Brown, Simon
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 16th Australasian Conference on Data Mining, AusDM 2018; Bathurst, NSW; 28 November 2018 through 30 November 2018 Vol. 996, p. 69-80
- Full Text: false
- Reviewed:
- Description: Fraud detection for online banking is an important research area, but one of the challenges is the heterogeneous nature of transactions data i.e. a combination of numeric as well as mixed attributes. Usually, numeric format data gives better performance for classification, regression and clustering algorithms. However, many machine learning problems have categorical, or nominal features, rather than numeric features only. In addition, some machine learning platforms such as Apache Spark accept numeric data only. One-hot Encoding (OHE) is a widely used approach for transforming categorical features to numerical features in traditional data mining tasks. The one-hot approach has some challenges as well: the sparseness of the transformed data and that the distinct values of an attribute are not always known in advance. Other than the model accuracy, compactness of machine learning models is equally important due to growing memory and storage needs. This paper presents an innovative technique to transform categorical features to numeric features by compacting sparse data even if all the distinct values are not known. The transformed data can be used for the development of fraud detection systems. The accuracy of the results has been validated on synthetic and real bank fraud data and a publicly available anomaly detection (KDD-99) dataset on a multi-node data cluster. © Springer Nature Singapore Pte Ltd. 2019.
Cybersecurity indexes for eHealth
- Authors: Burke, Wendy , Oseni, Taiwo , Jolfaei, Alireza , Gondal, Iqbal
- Date: 2019
- Type: Text , Conference proceedings
- Relation: 2019 Australasian Computer Science Week Multiconference, ACSW 2019; Sydney, Australia; 29th-31st January 2019 p. 1-8
- Full Text: false
- Reviewed:
- Description: This study aimed to explore the cybersecurity landscape to identify cybersecurity indexes that may be relevant to the health industry. While the healthcare sector poses security concerns regarding patients' records, cybersecurity in the healthcare sector has not been given much consideration. Cybersecurity indexes are a survey that measures security preparedness and capabilities of a country or organisation. An index is made up of a series of questions, often broken into categories. These categories target areas such as law, technical responses, organisational threats, capacity building and social context. Some indexes provide ranking capabilities against other countries, while others directly evaluate what it means to be cyber-ready. In this paper, cybersecurity indexes were reviewed regarding the level of assessment (country level/organisation level), and their consideration of the wider community, the health sector, and their appearance in academic literature. Results from this study found that there was no consistency between the indexes investigated, with each index having a diverse number of categories and indicators. Some indexes resulted in a score; others did not rank their results in league tables. Evidence to calculate the level of adherence was often obtained from secondary sources, with four of the country indexes using both primary and secondary sources. Eight (out of fourteen) indexes measured wider community indicators and only one index specifically measured eHealth services. Findings from the initial systematic review suggest that hardly any peer-reviewed journal articles exist on the topic of cybersecurity indexes. The paper concludes that most of the indexes studied are broad and do not consider the eHealth sector specifically. Each index relies on a different process to gauge cybersecurity, with little to no academic rigour. It is expected that this research will contribute to the current (limited) literature addressing cybersecurity indexes.
- Description: ACM International Conference Proceeding Series
Evolved similarity techniques in malware analysis
- Authors: Black, Paul , Gondal, Iqbal , Vamplew, Peter , Lakhotia, Arun
- Date: 2019
- Type: Text , Conference proceedings
- Relation: 2019 18th IEEE International Conference On Trust, Security And Privacy; published in In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 5-8th Aug, 2019 p. 404-410
- Full Text: false
- Reviewed:
- Description: Malware authors are known to reuse existing code, this development process results in software evolution and a sequence of versions of a malware family containing functions that show a divergence from the initial version. This paper proposes the term evolved similarity to account for this gradual divergence of similarity across the version history of a malware family. While existing techniques are able to match functions in different versions of malware, these techniques work best when the version changes are relatively small. This paper introduces the concept of evolved similarity and presents automated Evolved Similarity Techniques (EST). EST differs from existing malware function similarity techniques by focusing on the identification of significantly modified functions in adjacent malware versions and may also be used to identify function similarity in malware samples that differ by several versions. The challenge in identifying evolved malware function pairs lies in identifying features that are relatively invariant across evolved code. The research in this paper makes use of the function call graph to establish these features and then demonstrates the use of these techniques using Zeus malware.
Generative malware outbreak detection
- Authors: Park, Sean , Gondal, Iqbal , Kamruzzaman, Joarder , Oliver, Jon
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019 Vol. 2019-February, p. 1149-1154
- Full Text: false
- Reviewed:
- Description: Recently several deep learning approaches have been attempted to detect malware binaries using convolutional neural networks and stacked deep autoencoders. Although they have shown respectable performance on a large corpus of dataset, practical defense systems require precise detection during the malware outbreaks where only a handful of samples are available. This paper demonstrates the effectiveness of the latent representations obtained through the adversarial autoencoder for malware outbreak detection. Using instruction sequence distribution mapped to a semantic latent vector, the model provides a highly effective neural signature that helps detecting variants of a previously identified malware within a campaign mutated with minor functional upgrade, function shuffling, or slightly modified obfuscations. The method demonstrates how adversarial autoencoder can turn a multiclass classification task into a clustering problem when the sample set size is limited and the distribution is biased. The model performance is evaluated on OS X malware dataset against traditional machine learning models. © 2019 IEEE.
- Description: E1
Multi-source cyber-attacks detection using machine learning
- Authors: Taheri, Sona , Gondal, Iqbal , Bagirov, Adil , Harkness, Greg , Brown, Simon , Chi, Chihung
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne, Australia; 13th-15th February 2019 Vol. 2019-February, p. 1167-1172
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has significantly increased the number of devices connected to the Internet ranging from sensors to multi-source data information. As the IoT continues to evolve with new technologies number of threats and attacks against IoT devices are on the increase. Analyzing and detecting these attacks originating from different sources needs machine learning models. These models provide proactive solutions for detecting attacks and their sources. In this paper, we propose to apply a supervised machine learning classification technique to identify cyber-attacks from each source. More precisely, we apply the incremental piecewise linear classifier that constructs boundary between sources/classes incrementally starting with one hyperplane and adding more hyperplanes at each iteration. The algorithm terminates when no further significant improvement of the separation of sources/classes is possible. The construction and usage of piecewise linear boundaries allows us to avoid any possible overfitting. We apply the incremental piecewise linear classifier on the multi-source real world cyber security data set to identify cyber-attacks and their sources.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
One-shot malware outbreak detection using spatio-temporal isomorphic dynamic features
- Authors: Park, Sean , Gondal, Iqbal , Kamruzzaman, Joarder , Zhang, Leo
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science and Engineering, TrustCom/BigDataSE 2019 p. 751-756
- Full Text: false
- Reviewed:
- Description: Fingerprinting the malware by its behavioural signature has been an attractive approach for malware detection due to the homogeneity of dynamic execution patterns across different variants of similar families. Although previous researches show reasonably good performance in dynamic detection using machine learning techniques on a large corpus of training set, decisions must be undertaken based upon a scarce number of observable samples in many practical defence scenarios. This paper demonstrates the effectiveness of generative adversarial autoencoder for dynamic malware detection under outbreak situations where in most cases a single sample is available for training the machine learning algorithm to detect similar samples that are in the wild. © 2019 IEEE.
- Description: E1
Selective adversarial learning for mobile malware
- Authors: Khoda, Mahbub , Imam, Tasadduq , Kamruzzaman, Joarder , Gondal, Iqbal , Rahman, Ashfaqur
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science and Engineering, TrustCom/BigDataSE 2019 p. 272-279
- Full Text: false
- Reviewed:
- Description: Machine learning models, including deep neural networks, have been shown to be vulnerable to adversarial attacks. Adversarial samples are crafted from legitimate inputs by carefully introducing small perturbation to the input so that the classifier is fooled. Adversarial retraining, which involves retraining the classifier using adversarial samples, has been shown to improve the robustness of the classifier against adversarial attacks. However, it has been also shown that retraining with too many samples can lead to performance degradation. Hence, a careful selection of the adversarial samples that are used to retrain the classifier is necessary, yet existing works select these samples in a randomized fashion. In our work, we propose two novel approaches for selecting adversarial samples: based on the distance from cluster center of malware and based on the probability derived from a kernel based learning (KBL). Our experiment results show that both of our selective mechanisms for adversarial retraining outperform the random selection technique and significantly improve the classifier performance against adversarial attacks. In particular, selection with KBL delivers above 6% improvement in detection accuracy compared to random selection. The method proposed here has greater impact in designing robust machine learning system for security applications. © 2019 IEEE.
- Description: E1
Vulnerability modelling for hybrid IT systems
- Authors: Ur-Rehman, Attiq , Gondal, Iqbal , Kamruzzuman, Joarder , Jolfaei, Alireza
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne, Australia; 13th-15th February 2019 Vol. 2019-February, p. 1186-1191
- Full Text:
- Reviewed:
- Description: Common vulnerability scoring system (CVSS) is an industry standard that can assess the vulnerability of nodes in traditional computer systems. The metrics computed by CVSS would determine critical nodes and attack paths. However, traditional IT security models would not fit IoT embedded networks due to distinct nature and unique characteristics of IoT systems. This paper analyses the application of CVSS for IoT embedded systems and proposes an improved vulnerability scoring system based on CVSS v3 framework. The proposed framework, named CVSSIoT, is applied to a realistic IT supply chain system and the results are compared with the actual vulnerabilities from the national vulnerability database. The comparison result validates the proposed model. CVSSIoT is not only effective, simple and capable of vulnerability evaluation for traditional IT system, but also exploits unique characteristics of IoT devices.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
A patient agent to manage blockchains for remote patient monitoring
- Authors: Uddin, Ashraf , Stranieri, Andrew , Gondal, Iqbal , Balasubramanian, Venki
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 7th International Conference on Global Telehealth, GT 2018; Colombo, Sri Lanka; 10th-11th October 2018; published in Studies in Health Technology and Informatics Vol. 254, p. 105-115
- Full Text: false
- Reviewed:
- Description: Continuous monitoring of patient's physiological signs has the potential to augment traditional medical practice, particularly in developing countries that have a shortage of healthcare professionals. However, continuously streamed data presents additional security, storage and retrieval challenges and further inhibits initiatives to integrate data to form electronic health record systems. Blockchain technologies enable data to be stored securely and inexpensively without recourse to a trusted authority. Blockchain technologies also promise to provide architectures for electronic health records that do not require huge government expenditure that challenge developing nations. However, Blockchain deployment, particularly with streamed data challenges existing Blockchain algorithms that take too long to place data in a block, and have no mechanism to determine whether every data point in every stream should be stored in such a secure way. This article presents an architecture that involves a Patient Agent, coordinating the insertion of continuous data streams into Blockchains to form an electronic health record.
- Description: Studies in Health Technology and Informatics
A server side solution for detecting webInject : A machine learning approach
- Authors: Moniruzzaman, Md , Bagirov, Adil , Gondal, Iqbal , Brown, Simon
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2018; Melbourne, Australia; 3rd June 2018; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11154 LNAI, p. 162-167
- Full Text: false
- Reviewed:
- Description: With the advancement of client-side on the fly web content generation techniques, it becomes easier for attackers to modify the content of a website dynamically and gain access to valuable information. A majority portion of online attacks is now done by WebInject. The end users are not always skilled enough to differentiate between injected content and actual contents of a webpage. Some of the existing solutions are designed for client side and all the users have to install it in their system, which is a challenging task. In addition, various platforms and tools are used by individuals, so different solutions needed to be designed. Existing server side solution often focuses on sanitizing and filtering the inputs. It will fail to detect obfuscated and hidden scripts. In this paper, we propose a server side solution using a machine learning approach to detect WebInject in banking websites. Unlike other techniques, our method collects features of a Document Object Model (DOM) and classifies it with the help of a pre-trained model.
An anomaly intrusion detection system using C5 decision tree classifier
- Authors: Khraisat, Ansam , Gondal, Iqbal , Vamplew, Peter
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2018; Melbourne, Australia; 3rd June 2018; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11154 LNAI, p. 149-155
- Full Text: false
- Reviewed:
- Description: Due to increase in intrusion activities over internet, many intrusion detection systems are proposed to detect abnormal activities, but most of these detection systems suffer a common problem which is producing a high number of alerts and a huge number of false positives. As a result, normal activities could be classified as intrusion activities. This paper examines different data mining techniques that could minimize both the number of false negatives and false positives. C5 classifier’s effectiveness is examined and compared with other classifiers. Results should that false negatives are reduced and intrusion detection has been improved significantly. A consequence of minimizing the false positives has resulted in reduction in the amount of the false alerts as well. In this study, multiple classifiers have been compared with C5 decision tree classifier using NSL_KDD dataset and results have shown that C5 has achieved high accuracy and low false alarms as an intrusion detection system.
- Description: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Mobile malware detection - An analysis of the impact of feature categories
- Authors: Khoda, Mahbub , Kamruzzaman, Joarder , Gondal, Iqbal , Imam, Tasadduq
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 25th International Conference on Neural Information Processing, ICONIP 2018; Siem Reap, Cambodia; 13th-16th December 2018; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11304 LNCS, p. 486-498
- Full Text: false
- Reviewed:
- Description: The use of smartphones and hand-held devices continues to increase with rapid development in underlying technology and widespread deployment of numerous applications including social network, email and financial transactions. Inevitably, malware attacks are shifting towards these devices. To detect mobile malware, features representing the characteristics of applications play a crucial role. In this work, we systematically studied the impact of all categories of features (i.e., permission, application programmers interface calls, inter component communication and dynamic features) of android applications in classifying a malware from benign applications. We identified the best combination of feature categories that yield better performance in terms of widely used metrics than blindly using all feature categories. We proposed a new technique to include contextual information in API calls into feature values and the study reveals that embedding such information enhances malware detection capability by a good margin. Information gain analysis shows that a significant number of features in ICC category is not relevant to malware prediction and hence, least effective. This study will be useful in designing better mobile malware detection system.
Dynamic content distribution for decentralized sharing in tourist spots using demand and supply
- Authors: Kamruzzaman, Joarder , Karmakar, Gour , Gondal, Iqbal , Kaisar, Shahriar
- Date: 2017
- Type: Text , Conference proceedings
- Relation: 13th IEEE International Wireless Communications and Mobile Computing Conference, IWCMC 2017; Valencia, Spain; 26th-30th June 2016 p. 2121-2126
- Full Text: false
- Reviewed:
- Description: Decentralized content sharing (DCS) is emerging as an important platform for sharing contents among smart mobile device users, where devices form an ad-hoc network and communicate opportunistically. Existing DCS approaches for tourist spot like scenarios achieve low delivery success rate and high latency as they do not focus on dynamic demand for contents which usually vary considerably with the number of visitors present or occurrence of some influencing events. The amount of available supply also changes because of the nodes leaving the area. Only way to improve content delivery service is to distribute the contents in strategic positions based on dynamic demand and supply. In this paper, we propose a dynamic content distribution (DCD) method considering dynamic demand and supply for contents in tourist spots. Simulation results validate the improvement of the proposed approach. © 2017 IEEE.
- Description: 2017 13th International Wireless Communications and Mobile Computing Conference, IWCMC 2017
Improving authorship attribution in twitter through topic-based sampling
- Authors: Pan, Luoxi , Gondal, Iqbal , Layton, Robert
- Date: 2017
- Type: Text , Conference proceedings
- Relation: 30th Australasian Joint Conference on Artificial Intelligence, AI 2017 : Advances in Artificial Intelligence; Melbourne, Australia; 19th-20th August 2017; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 10400 LNAI, p. 250-261
- Full Text: false
- Reviewed:
- Description: Aliases are used as a means of anonymity on the Internet in environments such as IRC (internet relay chat), forums and micro-blogging websites such as Twitter. While there are genuine reasons for the use of aliases, such as journalists operating in politically oppressive countries, they are increasingly being used by cybercriminals and extremist organisations. In recent years, we have seen increased research on authorship attribution of Twitter messages, including authorship analysis of aliases. Previous studies have shown that anti-aliasing of randomly generated sub-aliases yields high accuracies when linking the sub-aliases, but become much less accurate when topic-based sub-aliases are used. N-gram methods have previously been demonstrated to perform better than other methods in this situation. This paper investigates the effect of topic-based sampling on authorship attribution accuracy for the popular micro-blogging website Twitter. Features are extracted using character n-grams, which accurately capture differences in authorship style. These features are analysed using support vector machines using a one-versus-all classifier. The predictive performance of the algorithm is then evaluated using two different sampling methodologies - authors that were sampled through a context-sensitive topic-based search and authors that were sampled randomly. Topic-based sampling of authors is found to produce more accurate authorship predictions. This paper presents several theories as to why this might be the case. © Springer International Publishing AG 2017.