Robust malware defense in industrial IoT applications using machine learning with selective adversarial samples
- Khoda, Mahbub, Imam, Tasadduq, Kamruzzaman, Joarder, Gondal, Iqbal, Rahman, Ashfaqur
- Authors: Khoda, Mahbub , Imam, Tasadduq , Kamruzzaman, Joarder , Gondal, Iqbal , Rahman, Ashfaqur
- Date: 2019
- Type: Text , Journal article
- Relation: IEEE Transactions on Industry Applications Vol.56, no 4. (2020), p. 4415-4424
- Full Text:
- Reviewed:
- Description: Industrial Internet of Things (IIoT) deploys edge devices to act as intermediaries between sensors and actuators and application servers or cloud services. Machine learning models have been widely used to thwart malware attacks in such edge devices. However, these models are vulnerable to adversarial attacks where attackers craft adversarial samples by introducing small perturbations to malware samples to fool a classifier to misclassify them as benign applications. Literature on deep learning networks proposes adversarial retraining as a defense mechanism where adversarial samples are combined with legitimate samples to retrain the classifier. However, existing works select such adversarial samples in a random fashion which degrades the classifier's performance. This work proposes two novel approaches for selecting adversarial samples to retrain a classifier. One, based on the distance from malware cluster center, and the other, based on a probability measure derived from a kernel based learning (KBL). Our experiments show that both of our sample selection methods outperform the random selection method and the KBL selection method improves detection accuracy by 6%. Also, while existing works focus on deep neural networks with respect to adversarial retraining, we additionally assess the impact of such adversarial samples on other classifiers and our proposed selective adversarial retraining approaches show similar performance improvement for these classifiers as well. The outcomes from the study can assist in designing robust security systems for IIoT applications.
- Authors: Khoda, Mahbub , Imam, Tasadduq , Kamruzzaman, Joarder , Gondal, Iqbal , Rahman, Ashfaqur
- Date: 2019
- Type: Text , Journal article
- Relation: IEEE Transactions on Industry Applications Vol.56, no 4. (2020), p. 4415-4424
- Full Text:
- Reviewed:
- Description: Industrial Internet of Things (IIoT) deploys edge devices to act as intermediaries between sensors and actuators and application servers or cloud services. Machine learning models have been widely used to thwart malware attacks in such edge devices. However, these models are vulnerable to adversarial attacks where attackers craft adversarial samples by introducing small perturbations to malware samples to fool a classifier to misclassify them as benign applications. Literature on deep learning networks proposes adversarial retraining as a defense mechanism where adversarial samples are combined with legitimate samples to retrain the classifier. However, existing works select such adversarial samples in a random fashion which degrades the classifier's performance. This work proposes two novel approaches for selecting adversarial samples to retrain a classifier. One, based on the distance from malware cluster center, and the other, based on a probability measure derived from a kernel based learning (KBL). Our experiments show that both of our sample selection methods outperform the random selection method and the KBL selection method improves detection accuracy by 6%. Also, while existing works focus on deep neural networks with respect to adversarial retraining, we additionally assess the impact of such adversarial samples on other classifiers and our proposed selective adversarial retraining approaches show similar performance improvement for these classifiers as well. The outcomes from the study can assist in designing robust security systems for IIoT applications.
Machine learning in mental health: a scoping review of methods and applications
- Shatte, Adrian, Hutchinson, Delyse, Teague, Samantha
- Authors: Shatte, Adrian , Hutchinson, Delyse , Teague, Samantha
- Date: 2019
- Type: Text , Journal article
- Relation: Psychological Medicine Vol. 49, no. 9 (2019), p. 1426-1448
- Full Text: false
- Reviewed:
- Description: This paper aims to synthesise the literature on machine learning (ML) and big data applications for mental health, highlighting current research and applications in practice. We employed a scoping review methodology to rapidly map the field of ML in mental health. Eight health and information technology research databases were searched for papers covering this domain. Articles were assessed by two reviewers, and data were extracted on the article's mental health application, ML technique, data type, and study results. Articles were then synthesised via narrative review. Three hundred papers focusing on the application of ML to mental health were identified. Four main application domains emerged in the literature, including: (i) detection and diagnosis (ii) prognosis, treatment and support (iii) public health, and (iv) research and clinical administration. The most common mental health conditions addressed included depression, schizophrenia, and Alzheimer's disease. ML techniques used included support vector machines, decision trees, neural networks, latent Dirichlet allocation, and clustering. Overall, the application of ML to mental health has demonstrated a range of benefits across the areas of diagnosis, treatment and support, research, and clinical administration. With the majority of studies identified focusing on the detection and diagnosis of mental health conditions, it is evident that there is significant room for the application of ML to other areas of psychology and mental health. The challenges of using ML techniques are discussed, as well as opportunities to improve and advance the field.
Obfuscated memory malware detection in resource-constrained iot devices for smart city applications
- Shafin, Sakib, Karmakar, Gour, Mareels, Iven
- Authors: Shafin, Sakib , Karmakar, Gour , Mareels, Iven
- Date: 2023
- Type: Text , Journal article
- Relation: Sensors Vol. 23, no. 11 (2023), p. 5348
- Full Text:
- Reviewed:
- Description: Obfuscated Memory Malware (OMM) presents significant threats to interconnected systems, including smart city applications, for its ability to evade detection through concealment tactics. Existing OMM detection methods primarily focus on binary detection. Their multiclass versions consider a few families only and, thereby, fail to detect much existing and emerging malware. Moreover, their large memory size makes them unsuitable to be executed in resource-constrained embedded/IoT devices. To address this problem, in this paper, we propose a multiclass but lightweight malware detection method capable of identifying recent malware and is suitable to execute in embedded devices. For this, the method considers a hybrid model by combining the feature-learning capabilities of convolutional neural networks with the temporal modeling advantage of bidirectional long short-term memory. The proposed architecture exhibits compact size and fast processing speed, making it suitable for deployment in IoT devices that constitute the major components of smart city systems. Extensive experiments with the recent CIC-Malmem-2022 OMM dataset demonstrate that our method outperforms other machine learning-based models proposed in the literature in both detecting OMM and identifying specific attack types. Our proposed method thus offers a robust yet compact model executable in IoT devices for defending against obfuscated malware.
- Authors: Shafin, Sakib , Karmakar, Gour , Mareels, Iven
- Date: 2023
- Type: Text , Journal article
- Relation: Sensors Vol. 23, no. 11 (2023), p. 5348
- Full Text:
- Reviewed:
- Description: Obfuscated Memory Malware (OMM) presents significant threats to interconnected systems, including smart city applications, for its ability to evade detection through concealment tactics. Existing OMM detection methods primarily focus on binary detection. Their multiclass versions consider a few families only and, thereby, fail to detect much existing and emerging malware. Moreover, their large memory size makes them unsuitable to be executed in resource-constrained embedded/IoT devices. To address this problem, in this paper, we propose a multiclass but lightweight malware detection method capable of identifying recent malware and is suitable to execute in embedded devices. For this, the method considers a hybrid model by combining the feature-learning capabilities of convolutional neural networks with the temporal modeling advantage of bidirectional long short-term memory. The proposed architecture exhibits compact size and fast processing speed, making it suitable for deployment in IoT devices that constitute the major components of smart city systems. Extensive experiments with the recent CIC-Malmem-2022 OMM dataset demonstrate that our method outperforms other machine learning-based models proposed in the literature in both detecting OMM and identifying specific attack types. Our proposed method thus offers a robust yet compact model executable in IoT devices for defending against obfuscated malware.
Analysis of Classifiers for Prediction of Type II Diabetes Mellitus
- Barhate, Rahul, Kulkarni, Pradnya
- Authors: Barhate, Rahul , Kulkarni, Pradnya
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 4th International Conference on Computing, Communication Control and Automation, ICCUBEA 2018
- Full Text:
- Reviewed:
- Description: Diabetes mellitus is a chronic disease and a health challenge worldwide. According to the International Diabetes Federation, 451 million people across the globe have diabetes, with this number anticipated to rise up to 693 million people by 2045. It has been shown that 80% of the complications arising from type II diabetes can be prevented or delayed by early identification of the people who are at risk. Diabetes is difficult to diagnose in the early stages as its symptoms grow subtly and gradually. In a majority of the cases, the patients remain undiagnosed until they are admitted for a heart attack or begin to lose their sight. This paper analyzes the different classification algorithms based on a patient's health history to aid doctors identify the presence of as well as promote early diagnosis and treatment. The experiments were conducted on Pima Indian Diabetes data set. Various classifiers used include K Nearest Neighbors, Logistic Regression, Decision Trees, Random Forest, Gradient Boosting, Support Vector Machine and Neural Network. Results demonstrate that Random Forests performed well on the data set giving an accuracy of 79.7%. © 2018 IEEE.
- Description: E1
- Authors: Barhate, Rahul , Kulkarni, Pradnya
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 4th International Conference on Computing, Communication Control and Automation, ICCUBEA 2018
- Full Text:
- Reviewed:
- Description: Diabetes mellitus is a chronic disease and a health challenge worldwide. According to the International Diabetes Federation, 451 million people across the globe have diabetes, with this number anticipated to rise up to 693 million people by 2045. It has been shown that 80% of the complications arising from type II diabetes can be prevented or delayed by early identification of the people who are at risk. Diabetes is difficult to diagnose in the early stages as its symptoms grow subtly and gradually. In a majority of the cases, the patients remain undiagnosed until they are admitted for a heart attack or begin to lose their sight. This paper analyzes the different classification algorithms based on a patient's health history to aid doctors identify the presence of as well as promote early diagnosis and treatment. The experiments were conducted on Pima Indian Diabetes data set. Various classifiers used include K Nearest Neighbors, Logistic Regression, Decision Trees, Random Forest, Gradient Boosting, Support Vector Machine and Neural Network. Results demonstrate that Random Forests performed well on the data set giving an accuracy of 79.7%. © 2018 IEEE.
- Description: E1
A feature agnostic approach for glaucoma detection in OCT volumes
- Maetschke, Stefan, Antony, Bhavna, Ishikawa, Hiroshi, Wollstein, Gadi, Schuman, Joel, Garnavi, Rahil
- Authors: Maetschke, Stefan , Antony, Bhavna , Ishikawa, Hiroshi , Wollstein, Gadi , Schuman, Joel , Garnavi, Rahil
- Date: 2019
- Type: Text , Journal article
- Relation: PLoS One Vol. 14, no. 7 (2019), p. e0219126
- Full Text:
- Reviewed:
- Description: Optical coherence tomography (OCT) based measurements of retinal layer thickness, such as the retinal nerve fibre layer (RNFL) and the ganglion cell with inner plexiform layer (GCIPL) are commonly employed for the diagnosis and monitoring of glaucoma. Previously, machine learning techniques have relied on segmentation-based imaging features such as the peripapillary RNFL thickness and the cup-to-disc ratio. Here, we propose a deep learning technique that classifies eyes as healthy or glaucomatous directly from raw, unsegmented OCT volumes of the optic nerve head (ONH) using a 3D Convolutional Neural Network (CNN). We compared the accuracy of this technique with various feature-based machine learning algorithms and demonstrated the superiority of the proposed deep learning based method. Logistic regression was found to be the best performing classical machine learning technique with an AUC of 0.89. In direct comparison, the deep learning approach achieved a substantially higher AUC of 0.94 with the additional advantage of providing insight into which regions of an OCT volume are important for glaucoma detection. Computing Class Activation Maps (CAM), we found that the CNN identified neuroretinal rim and optic disc cupping as well as the lamina cribrosa (LC) and its surrounding areas as the regions significantly associated with the glaucoma classification. These regions anatomically correspond to the well established and commonly used clinical markers for glaucoma diagnosis such as increased cup volume, cup diameter, and neuroretinal rim thinning at the superior and inferior segments.
- Authors: Maetschke, Stefan , Antony, Bhavna , Ishikawa, Hiroshi , Wollstein, Gadi , Schuman, Joel , Garnavi, Rahil
- Date: 2019
- Type: Text , Journal article
- Relation: PLoS One Vol. 14, no. 7 (2019), p. e0219126
- Full Text:
- Reviewed:
- Description: Optical coherence tomography (OCT) based measurements of retinal layer thickness, such as the retinal nerve fibre layer (RNFL) and the ganglion cell with inner plexiform layer (GCIPL) are commonly employed for the diagnosis and monitoring of glaucoma. Previously, machine learning techniques have relied on segmentation-based imaging features such as the peripapillary RNFL thickness and the cup-to-disc ratio. Here, we propose a deep learning technique that classifies eyes as healthy or glaucomatous directly from raw, unsegmented OCT volumes of the optic nerve head (ONH) using a 3D Convolutional Neural Network (CNN). We compared the accuracy of this technique with various feature-based machine learning algorithms and demonstrated the superiority of the proposed deep learning based method. Logistic regression was found to be the best performing classical machine learning technique with an AUC of 0.89. In direct comparison, the deep learning approach achieved a substantially higher AUC of 0.94 with the additional advantage of providing insight into which regions of an OCT volume are important for glaucoma detection. Computing Class Activation Maps (CAM), we found that the CNN identified neuroretinal rim and optic disc cupping as well as the lamina cribrosa (LC) and its surrounding areas as the regions significantly associated with the glaucoma classification. These regions anatomically correspond to the well established and commonly used clinical markers for glaucoma diagnosis such as increased cup volume, cup diameter, and neuroretinal rim thinning at the superior and inferior segments.
An efficient hybrid system for anomaly detection in social networks
- Rahman, Md Shafiur, Halder, Sajal, Uddin, Ashraf, Acharjee, Uzzal
- Authors: Rahman, Md Shafiur , Halder, Sajal , Uddin, Ashraf , Acharjee, Uzzal
- Date: 2021
- Type: Text , Journal article
- Relation: Cybersecurity Vol. 4, no. 1 (2021), p.
- Full Text:
- Reviewed:
- Description: Anomaly detection has been an essential and dynamic research area in the data mining. A wide range of applications including different social medias have adopted different state-of-the-art methods to identify anomaly for ensuring user’s security and privacy. The social network refers to a forum used by different groups of people to express their thoughts, communicate with each other, and share the content needed. This social networks also facilitate abnormal activities, spread fake news, rumours, misinformation, unsolicited messages, and propaganda post malicious links. Therefore, detection of abnormalities is one of the important data analysis activities for the identification of normal or abnormal users on the social networks. In this paper, we have developed a hybrid anomaly detection method named DT-SVMNB that cascades several machine learning algorithms including decision tree (C5.0), Support Vector Machine (SVM) and Naïve Bayesian classifier (NBC) for classifying normal and abnormal users in social networks. We have extracted a list of unique features derived from users’ profile and contents. Using two kinds of dataset with the selected features, the proposed machine learning model called DT-SVMNB is trained. Our model classifies users as depressed one or suicidal one in the social network. We have conducted an experiment of our model using synthetic and real datasets from social network. The performance analysis demonstrates around 98% accuracy which proves the effectiveness and efficiency of our proposed system. © 2021, The Author(s).
- Authors: Rahman, Md Shafiur , Halder, Sajal , Uddin, Ashraf , Acharjee, Uzzal
- Date: 2021
- Type: Text , Journal article
- Relation: Cybersecurity Vol. 4, no. 1 (2021), p.
- Full Text:
- Reviewed:
- Description: Anomaly detection has been an essential and dynamic research area in the data mining. A wide range of applications including different social medias have adopted different state-of-the-art methods to identify anomaly for ensuring user’s security and privacy. The social network refers to a forum used by different groups of people to express their thoughts, communicate with each other, and share the content needed. This social networks also facilitate abnormal activities, spread fake news, rumours, misinformation, unsolicited messages, and propaganda post malicious links. Therefore, detection of abnormalities is one of the important data analysis activities for the identification of normal or abnormal users on the social networks. In this paper, we have developed a hybrid anomaly detection method named DT-SVMNB that cascades several machine learning algorithms including decision tree (C5.0), Support Vector Machine (SVM) and Naïve Bayesian classifier (NBC) for classifying normal and abnormal users in social networks. We have extracted a list of unique features derived from users’ profile and contents. Using two kinds of dataset with the selected features, the proposed machine learning model called DT-SVMNB is trained. Our model classifies users as depressed one or suicidal one in the social network. We have conducted an experiment of our model using synthetic and real datasets from social network. The performance analysis demonstrates around 98% accuracy which proves the effectiveness and efficiency of our proposed system. © 2021, The Author(s).
Towards machine learning approach for digital-health intervention program
- Santhanagopalan, Meena, Chetty, Madhu, Foale, Cameron, Klein, Britt
- Authors: Santhanagopalan, Meena , Chetty, Madhu , Foale, Cameron , Klein, Britt
- Date: 2019
- Type: Text , Journal article
- Relation: Australian Journal of Intelligent Information Processing System Vol. 15, no. 4 (2019), p. 16-24
- Full Text:
- Reviewed:
- Description: Digital-Health intervention (DHI) are used by health care providers to promote engagement within community. Effective assignment of participants into DHI programs helps increasing benefits from the most suitable intervention. A major challenge with the roll-out and implementation of DHI, is in assigning participants into different interventions. The use of biopsychosocial model [18] for this purpose is not wide spread, due to limited personalized interventions formed on evidence-based data-driven models. Machine learning has changed the way data extraction and interpretation works by involving automatic sets of generic methods that have replaced the traditional statistical techniques. In this paper, we propose to investigate relevance of machine learning for this purpose and is carried out by studying different non-linear classifiers and compare their prediction accuracy to evaluate their suitability. Further, as a novel contribution, real-life biopsychosocial features are used as input in this study. The results help in developing an appropriate predictive classication model to assign participants into the most suitable DHI. We analyze biopsychosocial data generated from a DHI program and study their feature characteristics using scatter plots. While scatter plots are unable to reveal the linear relationships in the data-set, the use of classifiers can successfully identify which features are suitable predictors of mental ill health.
- Authors: Santhanagopalan, Meena , Chetty, Madhu , Foale, Cameron , Klein, Britt
- Date: 2019
- Type: Text , Journal article
- Relation: Australian Journal of Intelligent Information Processing System Vol. 15, no. 4 (2019), p. 16-24
- Full Text:
- Reviewed:
- Description: Digital-Health intervention (DHI) are used by health care providers to promote engagement within community. Effective assignment of participants into DHI programs helps increasing benefits from the most suitable intervention. A major challenge with the roll-out and implementation of DHI, is in assigning participants into different interventions. The use of biopsychosocial model [18] for this purpose is not wide spread, due to limited personalized interventions formed on evidence-based data-driven models. Machine learning has changed the way data extraction and interpretation works by involving automatic sets of generic methods that have replaced the traditional statistical techniques. In this paper, we propose to investigate relevance of machine learning for this purpose and is carried out by studying different non-linear classifiers and compare their prediction accuracy to evaluate their suitability. Further, as a novel contribution, real-life biopsychosocial features are used as input in this study. The results help in developing an appropriate predictive classication model to assign participants into the most suitable DHI. We analyze biopsychosocial data generated from a DHI program and study their feature characteristics using scatter plots. While scatter plots are unable to reveal the linear relationships in the data-set, the use of classifiers can successfully identify which features are suitable predictors of mental ill health.
Skype Traffic Classification Using Cost Sensitive Algorithms
- Azab, Azab, Layton, Robert, Alazab, Mamoun, Watters, Paul
- Authors: Azab, Azab , Layton, Robert , Alazab, Mamoun , Watters, Paul
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings - 4th Cybercrime and Trustworthy Computing Workshop, CTC 2013 p. 14-21
- Full Text: false
- Reviewed:
- Description: Voice over IP (VoIP) technologies such as Skype are becoming increasingly popular and widely used in different organisations, and therefore identifying the usage of this service at the network level becomes very important. Reasons for this include applying Quality of Service (QoS), network planning, prohibiting its use in some networks and lawful interception of communications. Researchers have addressed VoIP traffic classification from different viewpoints, such as classifier accuracy, building time, classification time and online classification. This previous research tested their models using the same version of a VoIP product they used for training the model, giving generalizability only to that version of the product. This means that as new VoIP versions are released, these classifiers become obsolete. In this paper, we address if this approach is applicable to detecting new, untrained, versions of Skype. We suggest that using cost-sensitive classifiers can help to improve the accuracy of detecting untrained versions, by testing compared to other algorithms. Our experiment demonstrates promising preliminary results to detect Skype version 4, by building a cost sensitive classifier on Skype version 3, achieving an F-measure score of 0.57. This is a drastic improvement from not using cost sensitivity, which scores an F-measure of 0. This approach may be enhanced to improve the detection results and extended to improve detection for other applications that change protocols from version to version.
The ballarat incremental knowledge engine
- Dazeley, Richard, Warner, Philip, Johnson, Scott, Vamplew, Peter
- Authors: Dazeley, Richard , Warner, Philip , Johnson, Scott , Vamplew, Peter
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper pressented at 11th International Workshop on Knowledge Management and Acquisition for Smart Systems and Services, PKAW 2010 Vol. 6232 LNAI, p. 195-207
- Full Text:
- Reviewed:
- Description: Ripple Down Rules (RDR) is a maturing collection of methodologies for the incremental development and maintenance of medium to large rule-based knowledge systems. While earlier knowledge based systems relied on extensive modeling and knowledge engineering, RDR instead takes a simple no-model approach that merges the development and maintenance stages. Over the last twenty years RDR has been significantly expanded and applied in numerous domains. Until now researchers have generally implemented their own version of the methodologies, while commercial implementations are not made available. This has resulted in much duplicated code and the advantages of RDR not being available to a wider audience. The aim of this project is to develop a comprehensive and extensible platform that supports current and future RDR technologies, thereby allowing researchers and developers access to the power and versatility of RDR. This paper is a report on the current status of the project and marks the first release of the software. © 2010 Springer-Verlag Berlin Heidelberg.
- Authors: Dazeley, Richard , Warner, Philip , Johnson, Scott , Vamplew, Peter
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper pressented at 11th International Workshop on Knowledge Management and Acquisition for Smart Systems and Services, PKAW 2010 Vol. 6232 LNAI, p. 195-207
- Full Text:
- Reviewed:
- Description: Ripple Down Rules (RDR) is a maturing collection of methodologies for the incremental development and maintenance of medium to large rule-based knowledge systems. While earlier knowledge based systems relied on extensive modeling and knowledge engineering, RDR instead takes a simple no-model approach that merges the development and maintenance stages. Over the last twenty years RDR has been significantly expanded and applied in numerous domains. Until now researchers have generally implemented their own version of the methodologies, while commercial implementations are not made available. This has resulted in much duplicated code and the advantages of RDR not being available to a wider audience. The aim of this project is to develop a comprehensive and extensible platform that supports current and future RDR technologies, thereby allowing researchers and developers access to the power and versatility of RDR. This paper is a report on the current status of the project and marks the first release of the software. © 2010 Springer-Verlag Berlin Heidelberg.
Fraud detection for online banking for scalable and distributed data
- Authors: Haq, Ikram
- Date: 2020
- Type: Text , Thesis , PhD
- Full Text:
- Description: Online fraud causes billions of dollars in losses for banks. Therefore, online banking fraud detection is an important field of study. However, there are many challenges in conducting research in fraud detection. One of the constraints is due to unavailability of bank datasets for research or the required characteristics of the attributes of the data are not available. Numeric data usually provides better performance for machine learning algorithms. Most transaction data however have categorical, or nominal features as well. Moreover, some platforms such as Apache Spark only recognizes numeric data. So, there is a need to use techniques e.g. One-hot encoding (OHE) to transform categorical features to numerical features, however OHE has challenges including the sparseness of transformed data and that the distinct values of an attribute are not always known in advance. Efficient feature engineering can improve the algorithm’s performance but usually requires detailed domain knowledge to identify correct features. Techniques like Ripple Down Rules (RDR) are suitable for fraud detection because of their low maintenance and incremental learning features. However, high classification accuracy on mixed datasets, especially for scalable data is challenging. Evaluation of RDR on distributed platforms is also challenging as it is not available on these platforms. The thesis proposes the following solutions to these challenges: • We developed a technique Highly Correlated Rule Based Uniformly Distribution (HCRUD) to generate highly correlated rule-based uniformly-distributed synthetic data. • We developed a technique One-hot Encoded Extended Compact (OHE-EC) to transform categorical features to numeric features by compacting sparse-data even if all distinct values are unknown. • We developed a technique Feature Engineering and Compact Unified Expressions (FECUE) to improve model efficiency through feature engineering where the domain of the data is not known in advance. • A Unified Expression RDR fraud deduction technique (UE-RDR) for Big data has been proposed and evaluated on the Spark platform. Empirical tests were executed on multi-node Hadoop cluster using well-known classifiers on bank data, synthetic bank datasets and publicly available datasets from UCI repository. These evaluations demonstrated substantial improvements in terms of classification accuracy, ruleset compactness and execution speed.
- Description: Doctor of Philosophy
- Authors: Haq, Ikram
- Date: 2020
- Type: Text , Thesis , PhD
- Full Text:
- Description: Online fraud causes billions of dollars in losses for banks. Therefore, online banking fraud detection is an important field of study. However, there are many challenges in conducting research in fraud detection. One of the constraints is due to unavailability of bank datasets for research or the required characteristics of the attributes of the data are not available. Numeric data usually provides better performance for machine learning algorithms. Most transaction data however have categorical, or nominal features as well. Moreover, some platforms such as Apache Spark only recognizes numeric data. So, there is a need to use techniques e.g. One-hot encoding (OHE) to transform categorical features to numerical features, however OHE has challenges including the sparseness of transformed data and that the distinct values of an attribute are not always known in advance. Efficient feature engineering can improve the algorithm’s performance but usually requires detailed domain knowledge to identify correct features. Techniques like Ripple Down Rules (RDR) are suitable for fraud detection because of their low maintenance and incremental learning features. However, high classification accuracy on mixed datasets, especially for scalable data is challenging. Evaluation of RDR on distributed platforms is also challenging as it is not available on these platforms. The thesis proposes the following solutions to these challenges: • We developed a technique Highly Correlated Rule Based Uniformly Distribution (HCRUD) to generate highly correlated rule-based uniformly-distributed synthetic data. • We developed a technique One-hot Encoded Extended Compact (OHE-EC) to transform categorical features to numeric features by compacting sparse-data even if all distinct values are unknown. • We developed a technique Feature Engineering and Compact Unified Expressions (FECUE) to improve model efficiency through feature engineering where the domain of the data is not known in advance. • A Unified Expression RDR fraud deduction technique (UE-RDR) for Big data has been proposed and evaluated on the Spark platform. Empirical tests were executed on multi-node Hadoop cluster using well-known classifiers on bank data, synthetic bank datasets and publicly available datasets from UCI repository. These evaluations demonstrated substantial improvements in terms of classification accuracy, ruleset compactness and execution speed.
- Description: Doctor of Philosophy
Measuring trustworthiness of image data in the internet of things environment
- Authors: Islam, Mohammad
- Date: 2021
- Type: Text , Thesis , PhD
- Full Text:
- Description: Internet of Things (IoT) image sensors generate huge volumes of digital images every day. However, easy availability and usability of photo editing tools, the vulnerability in communication channels and malicious software have made forgery attacks on image sensor data effortless and thus expose IoT systems to cyberattacks. In IoT applications such as smart cities and surveillance systems, the smooth operation depends on sensors’ sharing data with other sensors of identical or different types. Therefore, a sensor must be able to rely on the data it receives from other sensors; in other words, data must be trustworthy. Sensors deployed in IoT applications are usually limited to low processing and battery power, which prohibits the use of complex cryptography and security mechanism and the adoption of universal security standards by IoT device manufacturers. Hence, estimating the trust of the image sensor data is a defensive solution as these data are used for critical decision-making processes. To our knowledge, only one published work has estimated the trustworthiness of digital images applied to forensic applications. However, that study’s method depends on machine learning prediction scores returned by existing forensic models, which limits its usage where underlying forensics models require different approaches (e.g., machine learning predictions, statistical methods, digital signature, perceptual image hash). Multi-type sensor data correlation and context awareness can improve the trust measurement, which is absent in that study’s model. To address these issues, novel techniques are introduced to accurately estimate the trustworthiness of IoT image sensor data with the aid of complementary non-imagery (numeric) data-generating sensors monitoring the same environment. The trust estimation models run in edge devices, relieving sensors from computationally intensive tasks. First, to detect local image forgery (splicing and copy-move attacks), an innovative image forgery detection method is proposed based on Discrete Cosine Transformation (DCT), Local Binary Pattern (LBP) and a new feature extraction method using the mean operator. Using Support Vector Machine (SVM), the proposed method is extensively tested on four well-known publicly available greyscale and colour image forgery datasets and on an IoT-based image forgery dataset that we built. Experimental results reveal the superiority of our proposed method over recent state-of-the-art methods in terms of widely used performance metrics and computational time and demonstrate robustness against low availability of forged training samples. Second, a robust trust estimation framework for IoT image data is proposed, leveraging numeric data-generating sensors deployed in the same area of interest (AoI) in an indoor environment. As low-cost sensors allow many IoT applications to use multiple types of sensors to observe the same AoI, the complementary numeric data of one sensor can be exploited to measure the trust value of another image sensor’s data. A theoretical model is developed using Shannon’s entropy to derive the uncertainty associated with an observed event and Dempster-Shafer theory (DST) for decision fusion. The proposed model’s efficacy in estimating the trust score of image sensor data is analysed by observing a fire event using IoT image and temperature sensor data in an indoor residential setup under different scenarios. The proposed model produces highly accurate trust scores in all scenarios with authentic and forged image data. Finally, as the outdoor environment varies dynamically due to different natural factors (e.g., lighting condition variations in day and night, presence of different objects, smoke, fog, rain, shadow in the scene), a novel trust framework is proposed that is suitable for the outdoor environments with these contextual variations. A transfer learning approach is adopted to derive the decision about an observation from image sensor data, while also a statistical approach is used to derive the decision about the same observation from numeric data generated from other sensors deployed in the same AoI. These decisions are then fused using CertainLogic and compared with DST-based fusion. A testbed was set up using Raspberry Pi microprocessor, image sensor, temperature sensor, edge device, LoRa nodes, LoRaWAN gateway and servers to evaluate the proposed techniques. The results show that CertainLogic is more suitable for measuring the trustworthiness of image sensor data in an outdoor environment.
- Description: Doctor of Philosophy
- Authors: Islam, Mohammad
- Date: 2021
- Type: Text , Thesis , PhD
- Full Text:
- Description: Internet of Things (IoT) image sensors generate huge volumes of digital images every day. However, easy availability and usability of photo editing tools, the vulnerability in communication channels and malicious software have made forgery attacks on image sensor data effortless and thus expose IoT systems to cyberattacks. In IoT applications such as smart cities and surveillance systems, the smooth operation depends on sensors’ sharing data with other sensors of identical or different types. Therefore, a sensor must be able to rely on the data it receives from other sensors; in other words, data must be trustworthy. Sensors deployed in IoT applications are usually limited to low processing and battery power, which prohibits the use of complex cryptography and security mechanism and the adoption of universal security standards by IoT device manufacturers. Hence, estimating the trust of the image sensor data is a defensive solution as these data are used for critical decision-making processes. To our knowledge, only one published work has estimated the trustworthiness of digital images applied to forensic applications. However, that study’s method depends on machine learning prediction scores returned by existing forensic models, which limits its usage where underlying forensics models require different approaches (e.g., machine learning predictions, statistical methods, digital signature, perceptual image hash). Multi-type sensor data correlation and context awareness can improve the trust measurement, which is absent in that study’s model. To address these issues, novel techniques are introduced to accurately estimate the trustworthiness of IoT image sensor data with the aid of complementary non-imagery (numeric) data-generating sensors monitoring the same environment. The trust estimation models run in edge devices, relieving sensors from computationally intensive tasks. First, to detect local image forgery (splicing and copy-move attacks), an innovative image forgery detection method is proposed based on Discrete Cosine Transformation (DCT), Local Binary Pattern (LBP) and a new feature extraction method using the mean operator. Using Support Vector Machine (SVM), the proposed method is extensively tested on four well-known publicly available greyscale and colour image forgery datasets and on an IoT-based image forgery dataset that we built. Experimental results reveal the superiority of our proposed method over recent state-of-the-art methods in terms of widely used performance metrics and computational time and demonstrate robustness against low availability of forged training samples. Second, a robust trust estimation framework for IoT image data is proposed, leveraging numeric data-generating sensors deployed in the same area of interest (AoI) in an indoor environment. As low-cost sensors allow many IoT applications to use multiple types of sensors to observe the same AoI, the complementary numeric data of one sensor can be exploited to measure the trust value of another image sensor’s data. A theoretical model is developed using Shannon’s entropy to derive the uncertainty associated with an observed event and Dempster-Shafer theory (DST) for decision fusion. The proposed model’s efficacy in estimating the trust score of image sensor data is analysed by observing a fire event using IoT image and temperature sensor data in an indoor residential setup under different scenarios. The proposed model produces highly accurate trust scores in all scenarios with authentic and forged image data. Finally, as the outdoor environment varies dynamically due to different natural factors (e.g., lighting condition variations in day and night, presence of different objects, smoke, fog, rain, shadow in the scene), a novel trust framework is proposed that is suitable for the outdoor environments with these contextual variations. A transfer learning approach is adopted to derive the decision about an observation from image sensor data, while also a statistical approach is used to derive the decision about the same observation from numeric data generated from other sensors deployed in the same AoI. These decisions are then fused using CertainLogic and compared with DST-based fusion. A testbed was set up using Raspberry Pi microprocessor, image sensor, temperature sensor, edge device, LoRa nodes, LoRaWAN gateway and servers to evaluate the proposed techniques. The results show that CertainLogic is more suitable for measuring the trustworthiness of image sensor data in an outdoor environment.
- Description: Doctor of Philosophy
WBAN-IoMT integration for a new frontier in healthcare technology
- Sabil, Nishat, Uddin, Md Minhaz, Miah, Md Sazal, Rhaman, Md Meganur, Mohammad, Kazi
- Authors: Sabil, Nishat , Uddin, Md Minhaz , Miah, Md Sazal , Rhaman, Md Meganur , Mohammad, Kazi
- Date: 2024
- Type: Text , Conference paper
- Relation: 2nd IEEE International Conference on Integrated Circuits and Communication Systems, ICICACS 2024, Raichur, India, 23-24 February 2024, 2nd International Conference on Integrated Circuits and Communication Systems, ICICACS 2024
- Full Text: false
- Reviewed:
- Description: This paper presents a cutting-edge approach for remotely collecting and transmitting crucial medical data for a Wireless Body Area Network (WBAN) technology integrated with the Internet of Medical Things (IoMT). At first, a prototype system was devised, which employs a Mosquitto server for seamless data transfer, a Node-red for debugging, and a Gadgetbridge for real-time data acquisition from a Mi Band 5 and stored in a database website called MediData. MediData encompasses medical history and vital parameters such as glucose rate and ECG report. This system allows the data to route the data to hospital servers for prompt analysis and action. The proposed WBAN technology eliminates the need for physical records and offers convenience and comfort for patients. Additionally, the integration of Smart ambulances further enhances the precision and velocity of data transmission to medical personnel. After going through several cross-validation models, finally obtained the best accuracy of 86% using the K-Nearest Neighbors method, while Random Forest shows comparatively better generalization with a lower cross-validation error of 24.75%. Findings from this work demonstrate the tremendous potential of WBAN technology, revolutionizing the way medical data is collected and analyzed in conjunction with IoMT for efficient and accurate medical data management. © 2024 IEEE.
Discriminating malware families using partitional clustering
- Authors: Mishra, Pooja
- Date: 2024
- Type: Text , Thesis , Masters
- Full Text: false
- Description: Malware, malicious software designed to compromise device security, is crafted by expert software engineers and distributed through a specialized black markets. Identifying malware families within daily feeds remains a significant challenge for internet security firms. Industry-standard Yara rules, based on regular expressions, are prone to failure due to malware evolution. This thesis presents an alternative approach leveraging malware clustering. By clustering malware samples based on dynamic analysis features, Yara scans can efficiently pinpoint known families, but unrecognized samples signify potential new variants, earmarked for further scrutiny by analysis teams. This process diminishes the necessity for individual sample scans, thereby streamlining operations and lightening the analysis team’s workload. This research evaluates the partitional clustering algorithm for improved handling of sparse malware features, setting it against the following traditional algorithms K-Means, Agglomerative Clustering, DBSCAN, and Spectral Kmeans Clustering. Each algorithm is evaluated, with a focus on their efficacy clustering performance: KMeans optimizes for homogeneous variance across n groups; Agglomerative Clustering scales for large datasets via connectivity matrices; DBSCAN discriminates clusters based on density metrics; and Spectral K-means Clustering employs affinity matrix-based low-dimensional embedding prior to clustering. The contribution of this thesis include a comprehensive performance comparison of the partitional clustering algorithm against Hierarchical, Densitybased, Spectral K-means, and K-Means algorithms; enhancement of the partitional clustering algorithm for sparse data; an in-depth evaluation of features extracted from Application Programming Interface call parameters and Domain Name System queries executed by malware; and the development of countermeasures against malware’s anti-analysis tactics. The research utilizes a real-world malware dataset sourced from abuse.ch 1 [1]. Empirical results demonstrate the superior performance of the partitional clustering algorithm over traditional clustering techniques in the majority of tests conducted
- Description: Masters of Research
- «
- ‹
- 1
- ›
- »