Biopsychosocial Data Analytics and Modeling
- Authors: Santhanagopalan, Meena
- Date: 2021
- Type: Text , Thesis , PhD
- Full Text:
- Description: Sustained customisation of digital health intervention (DHI) programs, in the context of community health engagement, requires strong integration of multi-sourced interdisciplinary biopsychosocial health data. The biopsychosocial model is built upon the idea that biological, psychological and social processes are integrally and interactively involved in physical health and illness. One of the longstanding challenges of dealing with healthcare data is the wide variety of data generated from different sources and the increasing need to learn actionable insights that drive performance improvement. The growth of information and communication technology has led to the increased use of DHI programs. These programs use an observational methodology that helps researchers to study the everyday behaviour of participants during the course of the program by analysing data generated from digital tools such as wearables, online surveys and ecological momentary assessment (EMA). Combined with data reported from biological and psychological tests, this provides rich and unique biopsychosocial data. There is a strong need to review and apply novel approaches to combining biopsychosocial data from a methodological perspective. Although some studies have used data analytics in research on clinical trial data generated from digital interventions, data analytics on biopsychosocial data generated from DHI programs is limited. The study in this thesis develops and implements innovative approaches for analysing the existing unique and rich biopsychosocial data generated from the wellness study, a DHI program conducted by the School of Science, Psychology and Sport at Federation University. The characteristics of variety, value and veracity that usually describe big data are also relevant to the biopsychosocial data handled in this thesis. These historical, retrospective real-life biopsychosocial data provide fertile ground for research through the use of data analytics to discover patterns hidden in the data and to obtain new knowledge. This thesis presents the studies carried out on three aspects of biopsychosocial research. First, we present the salient traits of the three components - biological, psychological and social - of biopsychosocial research. Next, we investigate the challenges of pre-processing biopsychosocial data, placing special emphasis on the time-series data generated from wearable sensor devices. Finally, we present the application of statistical and machine learning (ML) tools to integrate variables from the biopsychosocial disciplines to build a predictive model. The first chapter presents the salient features of the biopsychosocial data for each discipline. The second chapter presents the challenges of pre-processing biopsychosocial data, focusing on the time-series data generated from wearable sensor devices. The third chapter uses statistical and ML tools to integrate variables from the biopsychosocial disciplines to build a predictive model. Among its other important analyses and results, the key contributions of the research described in this thesis include the following: 1. using gamma distribution to model neurocognitive reaction time data that presents interesting properties (skewness and kurtosis for the data distribution) 2. using novel ‘peak heart-rate’ count metric to quantify ‘biological’ stress 3. using the ML approach to evaluate DHIs 4. using a recurrent neural network (RNN) and long short-term memory (LSTM) data prediction model to predict Difficulties in Emotion Regulation Scale (DERS) and primary emotion (PE) using wearable sensor data.
- Description: Doctor of Philosophy
- Authors: Santhanagopalan, Meena
- Date: 2021
- Type: Text , Thesis , PhD
- Full Text:
- Description: Sustained customisation of digital health intervention (DHI) programs, in the context of community health engagement, requires strong integration of multi-sourced interdisciplinary biopsychosocial health data. The biopsychosocial model is built upon the idea that biological, psychological and social processes are integrally and interactively involved in physical health and illness. One of the longstanding challenges of dealing with healthcare data is the wide variety of data generated from different sources and the increasing need to learn actionable insights that drive performance improvement. The growth of information and communication technology has led to the increased use of DHI programs. These programs use an observational methodology that helps researchers to study the everyday behaviour of participants during the course of the program by analysing data generated from digital tools such as wearables, online surveys and ecological momentary assessment (EMA). Combined with data reported from biological and psychological tests, this provides rich and unique biopsychosocial data. There is a strong need to review and apply novel approaches to combining biopsychosocial data from a methodological perspective. Although some studies have used data analytics in research on clinical trial data generated from digital interventions, data analytics on biopsychosocial data generated from DHI programs is limited. The study in this thesis develops and implements innovative approaches for analysing the existing unique and rich biopsychosocial data generated from the wellness study, a DHI program conducted by the School of Science, Psychology and Sport at Federation University. The characteristics of variety, value and veracity that usually describe big data are also relevant to the biopsychosocial data handled in this thesis. These historical, retrospective real-life biopsychosocial data provide fertile ground for research through the use of data analytics to discover patterns hidden in the data and to obtain new knowledge. This thesis presents the studies carried out on three aspects of biopsychosocial research. First, we present the salient traits of the three components - biological, psychological and social - of biopsychosocial research. Next, we investigate the challenges of pre-processing biopsychosocial data, placing special emphasis on the time-series data generated from wearable sensor devices. Finally, we present the application of statistical and machine learning (ML) tools to integrate variables from the biopsychosocial disciplines to build a predictive model. The first chapter presents the salient features of the biopsychosocial data for each discipline. The second chapter presents the challenges of pre-processing biopsychosocial data, focusing on the time-series data generated from wearable sensor devices. The third chapter uses statistical and ML tools to integrate variables from the biopsychosocial disciplines to build a predictive model. Among its other important analyses and results, the key contributions of the research described in this thesis include the following: 1. using gamma distribution to model neurocognitive reaction time data that presents interesting properties (skewness and kurtosis for the data distribution) 2. using novel ‘peak heart-rate’ count metric to quantify ‘biological’ stress 3. using the ML approach to evaluate DHIs 4. using a recurrent neural network (RNN) and long short-term memory (LSTM) data prediction model to predict Difficulties in Emotion Regulation Scale (DERS) and primary emotion (PE) using wearable sensor data.
- Description: Doctor of Philosophy
Imbalanced data classification and its application in cyber security
- Authors: Moniruzzaman, Md
- Date: 2020
- Type: Text , Thesis , PhD
- Full Text:
- Description: Cyber security, also known as information technology security or simply as information security, aims to protect government organizations, companies and individuals by defending their computers, servers, electronic systems, networks, and data from malicious attacks. With the advancement of client-side on the fly web content generation techniques, it becomes easier for attackers to modify the content of a website dynamically and gain access to valuable information. The impact of cybercrime to the global economy is now more than ever, and it is growing day by day. Among various types of cybercrimes, financial attacks are widely spread and the financial sector is among most targeted. Both corporations and individuals are losing a huge amount of money each year. The majority portion of financial attacks is carried out by banking malware and web-based attacks. The end users are not always skilled enough to differentiate between injected content and actual contents of a webpage. Designing a real-time security system for ensuring a safe browsing experience is a challenging task. Some of the existing solutions are designed for client side and all the users have to install it in their system, which is very difficult to implement. In addition, various platforms and tools are used by organizations and individuals, therefore, different solutions are needed to be designed. The existing server-side solution often focuses on sanitizing and filtering the inputs. It will fail to detect obfuscated and hidden scripts. This is a realtime security system and any significant delay will hamper user experience. Therefore, finding the most optimized and efficient solution is very important. To ensure an easy installation and integration capabilities of any solution with the existing system is also a critical factor to consider. If the solution is efficient but difficult to integrate, then it may not be a feasible solution for practical use. Unsupervised and supervised data classification techniques have been widely applied to design algorithms for solving cyber security problems. The performance of these algorithms varies depending on types of cyber security problems and size of datasets. To date, existing algorithms do not achieve high accuracy in detecting malware activities. Datasets in cyber security and, especially those from financial sectors, are predominantly imbalanced datasets as the number of malware activities is significantly less than the number of normal activities. This means that classifiers for imbalanced datasets can be used to develop supervised data classification algorithms to detect malware activities. Development of classifiers for imbalanced data sets has been subject of research over the last decade. Most of these classifiers are based on oversampling and undersampling techniques and are not efficient in many situations as such techniques are applied globally. In this thesis, we develop two new algorithms for solving supervised data classification problems in imbalanced datasets and then apply them to solve malware detection problems. The first algorithm is designed using the piecewise linear classifiers by formulating this problem as an optimization problem and by applying the penalty function method. More specifically, we add more penalty to the objective function for misclassified points from minority classes. The second method is based on the combination of the supervised and unsupervised (clustering) algorithms. Such an approach allows one to identify areas in the input space where minority classes are located and to apply local oversampling or undersampling. This approach leads to the design of more efficient and accurate classifiers. The proposed algorithms are tested using real-world datasets. Results clearly demonstrate superiority of newly introduced algorithms. Then we apply these algorithms to design classifiers to detect malwares.
- Description: Doctor of Philosophy
- Authors: Moniruzzaman, Md
- Date: 2020
- Type: Text , Thesis , PhD
- Full Text:
- Description: Cyber security, also known as information technology security or simply as information security, aims to protect government organizations, companies and individuals by defending their computers, servers, electronic systems, networks, and data from malicious attacks. With the advancement of client-side on the fly web content generation techniques, it becomes easier for attackers to modify the content of a website dynamically and gain access to valuable information. The impact of cybercrime to the global economy is now more than ever, and it is growing day by day. Among various types of cybercrimes, financial attacks are widely spread and the financial sector is among most targeted. Both corporations and individuals are losing a huge amount of money each year. The majority portion of financial attacks is carried out by banking malware and web-based attacks. The end users are not always skilled enough to differentiate between injected content and actual contents of a webpage. Designing a real-time security system for ensuring a safe browsing experience is a challenging task. Some of the existing solutions are designed for client side and all the users have to install it in their system, which is very difficult to implement. In addition, various platforms and tools are used by organizations and individuals, therefore, different solutions are needed to be designed. The existing server-side solution often focuses on sanitizing and filtering the inputs. It will fail to detect obfuscated and hidden scripts. This is a realtime security system and any significant delay will hamper user experience. Therefore, finding the most optimized and efficient solution is very important. To ensure an easy installation and integration capabilities of any solution with the existing system is also a critical factor to consider. If the solution is efficient but difficult to integrate, then it may not be a feasible solution for practical use. Unsupervised and supervised data classification techniques have been widely applied to design algorithms for solving cyber security problems. The performance of these algorithms varies depending on types of cyber security problems and size of datasets. To date, existing algorithms do not achieve high accuracy in detecting malware activities. Datasets in cyber security and, especially those from financial sectors, are predominantly imbalanced datasets as the number of malware activities is significantly less than the number of normal activities. This means that classifiers for imbalanced datasets can be used to develop supervised data classification algorithms to detect malware activities. Development of classifiers for imbalanced data sets has been subject of research over the last decade. Most of these classifiers are based on oversampling and undersampling techniques and are not efficient in many situations as such techniques are applied globally. In this thesis, we develop two new algorithms for solving supervised data classification problems in imbalanced datasets and then apply them to solve malware detection problems. The first algorithm is designed using the piecewise linear classifiers by formulating this problem as an optimization problem and by applying the penalty function method. More specifically, we add more penalty to the objective function for misclassified points from minority classes. The second method is based on the combination of the supervised and unsupervised (clustering) algorithms. Such an approach allows one to identify areas in the input space where minority classes are located and to apply local oversampling or undersampling. This approach leads to the design of more efficient and accurate classifiers. The proposed algorithms are tested using real-world datasets. Results clearly demonstrate superiority of newly introduced algorithms. Then we apply these algorithms to design classifiers to detect malwares.
- Description: Doctor of Philosophy
Techniques for the reverse engineering of banking malware
- Authors: Black, Paul
- Date: 2020
- Type: Text , Thesis , PhD
- Full Text:
- Description: Malware attacks are a significant and frequently reported problem, adversely affecting the productivity of organisations and governments worldwide. The well-documented consequences of malware attacks include financial loss, data loss, reputation damage, infrastructure damage, theft of intellectual property, compromise of commercial negotiations, and national security risks. Mitiga-tion activities involve a significant amount of manual analysis. Therefore, there is a need for automated techniques for malware analysis to identify malicious behaviours. Research into automated techniques for malware analysis covers a wide range of activities. This thesis consists of a series of studies: an anal-ysis of banking malware families and their common behaviours, an emulated command and control environment for dynamic malware analysis, a technique to identify similar malware functions, and a technique for the detection of ransomware. An analysis of the nature of banking malware, its major malware families, behaviours, variants, and inter-relationships are provided in this thesis. In doing this, this research takes a broad view of malware analysis, starting with the implementation of the malicious behaviours through to detailed analysis using machine learning. The broad approach taken in this thesis differs from some other studies that approach malware research in a more abstract sense. A disadvantage of approaching malware research without domain knowledge, is that important methodology questions may not be considered. Large datasets of historical malware samples are available for countermea-sures research. However, due to the age of these samples, the original malware infrastructure is no longer available, often restricting malware operations to initialisation functions only. To address this absence, an emulated command and control environment is provided. This emulated environment provides full control of the malware, enabling the capabilities of the original in-the-wild operation, while enabling feature extraction for research purposes. A major focus of this thesis has been the development of a machine learn-ing function similarity method with a novel feature encoding that increases feature strength. This research develops techniques to demonstrate that the machine learning model trained on similarity features from one program can find similar functions in another, unrelated program. This finding can lead to the development of generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra. Further, this research examines the use of API call features for the identi-fication of ransomware and shows that a failure to consider malware analysis domain knowledge can lead to weaknesses in experimental design. In this case, we show that existing research has difficulty in discriminating between ransomware and benign cryptographic software. This thesis by publication, has developed techniques to advance the disci-pline of malware reverse engineering, in order to minimize harm due to cyber-attacks on critical infrastructure, government institutions, and industry.
- Description: Doctor of Philosophy
- Authors: Black, Paul
- Date: 2020
- Type: Text , Thesis , PhD
- Full Text:
- Description: Malware attacks are a significant and frequently reported problem, adversely affecting the productivity of organisations and governments worldwide. The well-documented consequences of malware attacks include financial loss, data loss, reputation damage, infrastructure damage, theft of intellectual property, compromise of commercial negotiations, and national security risks. Mitiga-tion activities involve a significant amount of manual analysis. Therefore, there is a need for automated techniques for malware analysis to identify malicious behaviours. Research into automated techniques for malware analysis covers a wide range of activities. This thesis consists of a series of studies: an anal-ysis of banking malware families and their common behaviours, an emulated command and control environment for dynamic malware analysis, a technique to identify similar malware functions, and a technique for the detection of ransomware. An analysis of the nature of banking malware, its major malware families, behaviours, variants, and inter-relationships are provided in this thesis. In doing this, this research takes a broad view of malware analysis, starting with the implementation of the malicious behaviours through to detailed analysis using machine learning. The broad approach taken in this thesis differs from some other studies that approach malware research in a more abstract sense. A disadvantage of approaching malware research without domain knowledge, is that important methodology questions may not be considered. Large datasets of historical malware samples are available for countermea-sures research. However, due to the age of these samples, the original malware infrastructure is no longer available, often restricting malware operations to initialisation functions only. To address this absence, an emulated command and control environment is provided. This emulated environment provides full control of the malware, enabling the capabilities of the original in-the-wild operation, while enabling feature extraction for research purposes. A major focus of this thesis has been the development of a machine learn-ing function similarity method with a novel feature encoding that increases feature strength. This research develops techniques to demonstrate that the machine learning model trained on similarity features from one program can find similar functions in another, unrelated program. This finding can lead to the development of generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra. Further, this research examines the use of API call features for the identi-fication of ransomware and shows that a failure to consider malware analysis domain knowledge can lead to weaknesses in experimental design. In this case, we show that existing research has difficulty in discriminating between ransomware and benign cryptographic software. This thesis by publication, has developed techniques to advance the disci-pline of malware reverse engineering, in order to minimize harm due to cyber-attacks on critical infrastructure, government institutions, and industry.
- Description: Doctor of Philosophy
Applying reinforcement learning in playing Robosoccer using the AIBO
- Authors: Mukherjee, Subhasis
- Date: 2010
- Type: Text , Thesis , Masters
- Full Text:
- Description: "Robosoccer is a popular test bed for AI programs around the world in which AIBO entertainments robots take part in the middle sized soccer event. These robots need a variety of skills to perform in a semi-real environment like this. The three key challenges are manoeuvrability, image recognition and decision making skills. This research is focussed on the decision making skills ... The work focuses on whether reinforcement learning as a form of semi supervised learning can effectively contribute to the goal keeper's decision making when a shot is taken." -
- Description: Master of Computing (by research)
- Authors: Mukherjee, Subhasis
- Date: 2010
- Type: Text , Thesis , Masters
- Full Text:
- Description: "Robosoccer is a popular test bed for AI programs around the world in which AIBO entertainments robots take part in the middle sized soccer event. These robots need a variety of skills to perform in a semi-real environment like this. The three key challenges are manoeuvrability, image recognition and decision making skills. This research is focussed on the decision making skills ... The work focuses on whether reinforcement learning as a form of semi supervised learning can effectively contribute to the goal keeper's decision making when a shot is taken." -
- Description: Master of Computing (by research)
- «
- ‹
- 1
- ›
- »