A tree-based stacking ensemble technique with feature selection for network intrusion detection
- Authors: Rashid, Mamanur , Kamruzzaman, Joarder , Imam, Tasadduq , Wibowo, Santoso , Gordon, Steven
- Date: 2022
- Type: Text , Journal article
- Relation: Applied Intelligence Vol. 52, no. 9 (2022), p. 9768-9781
- Full Text: false
- Reviewed:
- Description: Several studies have used machine learning algorithms to develop intrusion systems (IDS), which differentiate anomalous behaviours from the normal activities of network systems. Due to the ease of automated data collection and subsequently an increased size of collected data on network traffic and activities, the complexity of intrusion analysis is increasing exponentially. A particular issue, due to statistical and computation limitations, a single classifier may not perform well for large scale data as existent in modern IDS contexts. Ensemble methods have been explored in literature in such big data contexts. Although more complicated and requiring additional computation, literature has a note that ensemble methods can result in better accuracy than single classifiers in different large scale data classification contexts, and it is interesting to explore how ensemble approaches can perform in IDS. In this research, we introduce a tree-based stacking ensemble technique (SET) and test the effectiveness of the proposed model on two intrusion datasets (NSL-KDD and UNSW-NB15). We further enhance incorporate feature selection techniques to select the best relevant features with the proposed SET. A comprehensive performance analysis shows that our proposed model can better identify the normal and anomaly traffic in network than other existing IDS models. This implies the potentials of our proposed system for cybersecurity in Internet of Things (IoT) and large scale networks. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Adversarial training for deep learning-based cyberattack detection in IoT-based smart city applications
- Authors: Rashid, Md Mamunur , Kamruzzaman, Joarder , Mehedi Hassan, Mohammad , Imam, Tasadduq , Wibowo, Santoso , Gordon, Steven , Fortino, Giancarlo
- Date: 2022
- Type: Text , Journal article
- Relation: Computers and Security Vol. 120, no. (2022), p.
- Full Text: false
- Reviewed:
- Description: Intrusion Detection Systems (IDS) based on deep learning models can identify and mitigate cyberattacks in IoT applications in a resilient and systematic manner. These models, which support the IDS's decision, could be vulnerable to a cyberattack known as adversarial attack. In this type of attack, attackers create adversarial samples by introducing small perturbations to attack samples to trick a trained model into misclassifying them as benign applications. These attacks can cause substantial damage to IoT-based smart city models in terms of device malfunction, data leakage, operational outage and financial loss. To our knowledge, the impact of and defence against adversarial attacks on IDS models in relation to smart city applications have not been investigated yet. To address this research gap, in this work, we explore the effect of adversarial attacks on the deep learning and shallow machine learning models by using a recent IoT dataset and propose a method using adversarial retraining that can significantly improve IDS performance when confronting adversarial attacks. Simulation results demonstrate that the presence of adversarial samples deteriorates the detection accuracy significantly by above 70% while our proposed model can deliver detection accuracy above 99% against all types of attacks including adversarial attacks. This makes an IDS robust in protecting IoT-based smart city services. © 2022 Elsevier Ltd
Malware detection in edge devices with fuzzy oversampling and dynamic class weighting
- Authors: Khoda, Mahbub , Kamruzzaman, Joarder , Gondal, Iqbal , Imam, Tasadduq , Rahman, Ashfaqur
- Date: 2021
- Type: Text , Journal article
- Relation: Applied Soft Computing Vol. 112, no. (2021), p.
- Full Text: false
- Reviewed:
- Description: In Internet-of-things (IoT) domain, edge devices are used increasingly for data accumulation, preprocessing, and analytics. Intelligent integration of edge devices with Artificial Intelligence (AI) facilitates real-time analysis and decision making. However, these devices simultaneously provide additional attack opportunities for malware developers, potentially leading to information and financial loss. Machine learning approaches can detect such attacks but their performance degrades when benign samples substantially outnumber malware samples in training data. Existing approaches for such imbalanced data assume samples represented as continuous features and thus can generate invalid samples when malware applications are represented by binary features. We propose a novel malware oversampling technique that addresses this issue. Further, we propose two approaches for malware detection. Our first approach uses fuzzy set theory, while the second approach dynamically assigns higher priority to malware samples using a novel loss function. Combining our oversampling technique with these approaches, the proposed approach attains over 9% improvement over competing methods in terms of F1_score. Our approaches can, therefore, result in enhanced privacy and security in edge computing services. © 2021 Elsevier B.V.
Mobile malware detection with imbalanced data using a novel synthetic oversampling strategy and deep learning
- Authors: Khoda, Mahbub , Kamruzzaman, Joarder , Gondal, Iqbal , Imam, Tasadduq , Rahman, Ashfaqur
- Date: 2020
- Type: Text , Conference paper
- Relation: 16th International Conference on Wireless and Mobile Computing, Networking and Communications (IEEE WiMob), Virtual, Thessaloniki, 12 to 14 October 2020, International Conference on Wireless and Mobile Computing, Networking and Communications
- Full Text: false
- Reviewed:
- Description: Mobile malware detection is inherently an imbalanced data problem since the number of benign applications in the market is far greater than the number of malicious applications. Existing methods to handle imbalanced data, such as synthetic minority over-sampling, do not translate well into this domain since mobile malware detection generally deals with binary features and these methods are designed for continuous features. Also, methods adapted for categorical features cannot be applied here since random modifications of features can result in invalid sample generation. In this work, we propose a novel technique for generating synthetic samples for mobile malware detection with imbalanced data. Our proposed method adds new data points in the sample space by generating synthetic malware samples which also preserves the original functionality of the malicious apps. Experiments show that the proposed approach outperforms existing techniques in terms of precision, recall, F1score, and AUC. This study will be useful in building deep neural network-based systems to handle imbalanced data for mobile malware detection. © 2020 IEEE.
Mobile malware detection : an analysis of deep learning model
- Authors: Khoda, Mahbub , Kamruzzaman, Joarder , Gondal, Iqbal , Imam, Tasadduq , Rahman, Ashfaqur , IEEE
- Date: 2019
- Type: Text , Book chapter
- Relation: 2019 IEEE International Conference on Industrial Technology p. 1161-1166
- Full Text: false
- Reviewed:
- Description: Due to its widespread use, with numerous applications deployed everyday, smartphones have become an inevitable target of the malware developers. This huge number of applications renders manual inspection of codes infeasible; as such, researchers have proposed several malware detection techniques based on automatic machine learning tools. Deep learning has gained a lot of attention from the malware researchers due to its ability of capture complex relationships among inputs and outputs. However, deep learning models depend largely on several hyper-parameters (i.e., learning rate, batch size, dropout rate). Hence, it is of utmost importance to analyze the effect of these parameters on classifier performance. In this paper, we systematically studied the effect of these parameters along with the effect of network architecture. We showed that building arbitrary deep networks does not always improve classifier performance. We also determined the combination of hyper-parameters that yields best result. This study will be useful in building better deep neural network based model for malware classification.
Robust malware defense in industrial IoT applications using machine learning with selective adversarial samples
- Authors: Khoda, Mahbub , Imam, Tasadduq , Kamruzzaman, Joarder , Gondal, Iqbal , Rahman, Ashfaqur
- Date: 2019
- Type: Text , Journal article
- Relation: IEEE Transactions on Industry Applications Vol.56, no 4. (2020), p. 4415-4424
- Full Text:
- Reviewed:
- Description: Industrial Internet of Things (IIoT) deploys edge devices to act as intermediaries between sensors and actuators and application servers or cloud services. Machine learning models have been widely used to thwart malware attacks in such edge devices. However, these models are vulnerable to adversarial attacks where attackers craft adversarial samples by introducing small perturbations to malware samples to fool a classifier to misclassify them as benign applications. Literature on deep learning networks proposes adversarial retraining as a defense mechanism where adversarial samples are combined with legitimate samples to retrain the classifier. However, existing works select such adversarial samples in a random fashion which degrades the classifier's performance. This work proposes two novel approaches for selecting adversarial samples to retrain a classifier. One, based on the distance from malware cluster center, and the other, based on a probability measure derived from a kernel based learning (KBL). Our experiments show that both of our sample selection methods outperform the random selection method and the KBL selection method improves detection accuracy by 6%. Also, while existing works focus on deep neural networks with respect to adversarial retraining, we additionally assess the impact of such adversarial samples on other classifiers and our proposed selective adversarial retraining approaches show similar performance improvement for these classifiers as well. The outcomes from the study can assist in designing robust security systems for IIoT applications.
Selective adversarial learning for mobile malware
- Authors: Khoda, Mahbub , Imam, Tasadduq , Kamruzzaman, Joarder , Gondal, Iqbal , Rahman, Ashfaqur
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science and Engineering, TrustCom/BigDataSE 2019 p. 272-279
- Full Text: false
- Reviewed:
- Description: Machine learning models, including deep neural networks, have been shown to be vulnerable to adversarial attacks. Adversarial samples are crafted from legitimate inputs by carefully introducing small perturbation to the input so that the classifier is fooled. Adversarial retraining, which involves retraining the classifier using adversarial samples, has been shown to improve the robustness of the classifier against adversarial attacks. However, it has been also shown that retraining with too many samples can lead to performance degradation. Hence, a careful selection of the adversarial samples that are used to retrain the classifier is necessary, yet existing works select these samples in a randomized fashion. In our work, we propose two novel approaches for selecting adversarial samples: based on the distance from cluster center of malware and based on the probability derived from a kernel based learning (KBL). Our experiment results show that both of our selective mechanisms for adversarial retraining outperform the random selection technique and significantly improve the classifier performance against adversarial attacks. In particular, selection with KBL delivers above 6% improvement in detection accuracy compared to random selection. The method proposed here has greater impact in designing robust machine learning system for security applications. © 2019 IEEE.
- Description: E1
Mobile malware detection - An analysis of the impact of feature categories
- Authors: Khoda, Mahbub , Kamruzzaman, Joarder , Gondal, Iqbal , Imam, Tasadduq
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 25th International Conference on Neural Information Processing, ICONIP 2018; Siem Reap, Cambodia; 13th-16th December 2018; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11304 LNCS, p. 486-498
- Full Text: false
- Reviewed:
- Description: The use of smartphones and hand-held devices continues to increase with rapid development in underlying technology and widespread deployment of numerous applications including social network, email and financial transactions. Inevitably, malware attacks are shifting towards these devices. To detect mobile malware, features representing the characteristics of applications play a crucial role. In this work, we systematically studied the impact of all categories of features (i.e., permission, application programmers interface calls, inter component communication and dynamic features) of android applications in classifying a malware from benign applications. We identified the best combination of feature categories that yield better performance in terms of widely used metrics than blindly using all feature categories. We proposed a new technique to include contextual information in API calls into feature values and the study reveals that embedding such information enhances malware detection capability by a good margin. Information gain analysis shows that a significant number of features in ICC category is not relevant to malware prediction and hence, least effective. This study will be useful in designing better mobile malware detection system.