Malware detection in edge devices with fuzzy oversampling and dynamic class weighting
- Authors: Khoda, Mahbub , Kamruzzaman, Joarder , Gondal, Iqbal , Imam, Tasadduq , Rahman, Ashfaqur
- Date: 2021
- Type: Text , Journal article
- Relation: Applied Soft Computing Vol. 112, no. (2021), p.
- Full Text: false
- Reviewed:
- Description: In Internet-of-things (IoT) domain, edge devices are used increasingly for data accumulation, preprocessing, and analytics. Intelligent integration of edge devices with Artificial Intelligence (AI) facilitates real-time analysis and decision making. However, these devices simultaneously provide additional attack opportunities for malware developers, potentially leading to information and financial loss. Machine learning approaches can detect such attacks but their performance degrades when benign samples substantially outnumber malware samples in training data. Existing approaches for such imbalanced data assume samples represented as continuous features and thus can generate invalid samples when malware applications are represented by binary features. We propose a novel malware oversampling technique that addresses this issue. Further, we propose two approaches for malware detection. Our first approach uses fuzzy set theory, while the second approach dynamically assigns higher priority to malware samples using a novel loss function. Combining our oversampling technique with these approaches, the proposed approach attains over 9% improvement over competing methods in terms of F1_score. Our approaches can, therefore, result in enhanced privacy and security in edge computing services. © 2021 Elsevier B.V.
Mobile malware detection with imbalanced data using a novel synthetic oversampling strategy and deep learning
- Authors: Khoda, Mahbub , Kamruzzaman, Joarder , Gondal, Iqbal , Imam, Tasadduq , Rahman, Ashfaqur
- Date: 2020
- Type: Text , Conference paper
- Relation: 16th International Conference on Wireless and Mobile Computing, Networking and Communications (IEEE WiMob), Virtual, Thessaloniki, 12 to 14 October 2020, International Conference on Wireless and Mobile Computing, Networking and Communications
- Full Text: false
- Reviewed:
- Description: Mobile malware detection is inherently an imbalanced data problem since the number of benign applications in the market is far greater than the number of malicious applications. Existing methods to handle imbalanced data, such as synthetic minority over-sampling, do not translate well into this domain since mobile malware detection generally deals with binary features and these methods are designed for continuous features. Also, methods adapted for categorical features cannot be applied here since random modifications of features can result in invalid sample generation. In this work, we propose a novel technique for generating synthetic samples for mobile malware detection with imbalanced data. Our proposed method adds new data points in the sample space by generating synthetic malware samples which also preserves the original functionality of the malicious apps. Experiments show that the proposed approach outperforms existing techniques in terms of precision, recall, F1score, and AUC. This study will be useful in building deep neural network-based systems to handle imbalanced data for mobile malware detection. © 2020 IEEE.
Mobile malware detection : an analysis of deep learning model
- Authors: Khoda, Mahbub , Kamruzzaman, Joarder , Gondal, Iqbal , Imam, Tasadduq , Rahman, Ashfaqur , IEEE
- Date: 2019
- Type: Text , Book chapter
- Relation: 2019 IEEE International Conference on Industrial Technology p. 1161-1166
- Full Text: false
- Reviewed:
- Description: Due to its widespread use, with numerous applications deployed everyday, smartphones have become an inevitable target of the malware developers. This huge number of applications renders manual inspection of codes infeasible; as such, researchers have proposed several malware detection techniques based on automatic machine learning tools. Deep learning has gained a lot of attention from the malware researchers due to its ability of capture complex relationships among inputs and outputs. However, deep learning models depend largely on several hyper-parameters (i.e., learning rate, batch size, dropout rate). Hence, it is of utmost importance to analyze the effect of these parameters on classifier performance. In this paper, we systematically studied the effect of these parameters along with the effect of network architecture. We showed that building arbitrary deep networks does not always improve classifier performance. We also determined the combination of hyper-parameters that yields best result. This study will be useful in building better deep neural network based model for malware classification.
Robust malware defense in industrial IoT applications using machine learning with selective adversarial samples
- Authors: Khoda, Mahbub , Imam, Tasadduq , Kamruzzaman, Joarder , Gondal, Iqbal , Rahman, Ashfaqur
- Date: 2019
- Type: Text , Journal article
- Relation: IEEE Transactions on Industry Applications Vol.56, no 4. (2020), p. 4415-4424
- Full Text:
- Reviewed:
- Description: Industrial Internet of Things (IIoT) deploys edge devices to act as intermediaries between sensors and actuators and application servers or cloud services. Machine learning models have been widely used to thwart malware attacks in such edge devices. However, these models are vulnerable to adversarial attacks where attackers craft adversarial samples by introducing small perturbations to malware samples to fool a classifier to misclassify them as benign applications. Literature on deep learning networks proposes adversarial retraining as a defense mechanism where adversarial samples are combined with legitimate samples to retrain the classifier. However, existing works select such adversarial samples in a random fashion which degrades the classifier's performance. This work proposes two novel approaches for selecting adversarial samples to retrain a classifier. One, based on the distance from malware cluster center, and the other, based on a probability measure derived from a kernel based learning (KBL). Our experiments show that both of our sample selection methods outperform the random selection method and the KBL selection method improves detection accuracy by 6%. Also, while existing works focus on deep neural networks with respect to adversarial retraining, we additionally assess the impact of such adversarial samples on other classifiers and our proposed selective adversarial retraining approaches show similar performance improvement for these classifiers as well. The outcomes from the study can assist in designing robust security systems for IIoT applications.
Selective adversarial learning for mobile malware
- Authors: Khoda, Mahbub , Imam, Tasadduq , Kamruzzaman, Joarder , Gondal, Iqbal , Rahman, Ashfaqur
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science and Engineering, TrustCom/BigDataSE 2019 p. 272-279
- Full Text: false
- Reviewed:
- Description: Machine learning models, including deep neural networks, have been shown to be vulnerable to adversarial attacks. Adversarial samples are crafted from legitimate inputs by carefully introducing small perturbation to the input so that the classifier is fooled. Adversarial retraining, which involves retraining the classifier using adversarial samples, has been shown to improve the robustness of the classifier against adversarial attacks. However, it has been also shown that retraining with too many samples can lead to performance degradation. Hence, a careful selection of the adversarial samples that are used to retrain the classifier is necessary, yet existing works select these samples in a randomized fashion. In our work, we propose two novel approaches for selecting adversarial samples: based on the distance from cluster center of malware and based on the probability derived from a kernel based learning (KBL). Our experiment results show that both of our selective mechanisms for adversarial retraining outperform the random selection technique and significantly improve the classifier performance against adversarial attacks. In particular, selection with KBL delivers above 6% improvement in detection accuracy compared to random selection. The method proposed here has greater impact in designing robust machine learning system for security applications. © 2019 IEEE.
- Description: E1
Mobile malware detection - An analysis of the impact of feature categories
- Authors: Khoda, Mahbub , Kamruzzaman, Joarder , Gondal, Iqbal , Imam, Tasadduq
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 25th International Conference on Neural Information Processing, ICONIP 2018; Siem Reap, Cambodia; 13th-16th December 2018; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11304 LNCS, p. 486-498
- Full Text: false
- Reviewed:
- Description: The use of smartphones and hand-held devices continues to increase with rapid development in underlying technology and widespread deployment of numerous applications including social network, email and financial transactions. Inevitably, malware attacks are shifting towards these devices. To detect mobile malware, features representing the characteristics of applications play a crucial role. In this work, we systematically studied the impact of all categories of features (i.e., permission, application programmers interface calls, inter component communication and dynamic features) of android applications in classifying a malware from benign applications. We identified the best combination of feature categories that yield better performance in terms of widely used metrics than blindly using all feature categories. We proposed a new technique to include contextual information in API calls into feature values and the study reveals that embedding such information enhances malware detection capability by a good margin. Information gain analysis shows that a significant number of features in ICC category is not relevant to malware prediction and hence, least effective. This study will be useful in designing better mobile malware detection system.