Hybrids of support vector machine wrapper and filter based framework for malware detection
- Authors: Huda, Shamsul , Abawajy, Jemal , Alazab, Mamoun , Abdollahian, Mali , Islam, Rafiqul , Yearwood, John
- Date: 2016
- Type: Text , Journal article
- Relation: Future Generation Computer Systems Vol. 55, no. (2016), p. 376-390
- Full Text: false
- Reviewed:
- Description: Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation Anti-Virus (AV) engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current AV engines in detecting malware. In this paper we propose a hybrid framework for malware detection by using the hybrids of Support Vector Machines Wrapper, Maximum-Relevance–Minimum-Redundancy Filter heuristics where Application Program Interface (API) call statistics are used as a malware features. The novelty of our hybrid framework is that it injects the filter’s ranking score in the wrapper selection process and combines the properties of both wrapper and filters and API call statistics which can detect malware based on the nature of infectious actions instead of signature. To the best of our knowledge, this kind of hybrid approach has not been explored yet in the literature in the context of feature selection and malware detection. Knowledge about the intrinsic characteristics of malicious activities is determined by the API call statistics which is injected as a filter score into the wrapper’s backward elimination process in order to find the most significant APIs. While using the most significant APIs in the wrapper classification on both obfuscated and benign types malware datasets, the results show that the proposed hybrid framework clearly surpasses the existing models including the independent filters and wrappers using only a very compact set of significant APIs. The performances of the proposed and existing models have further been compared using binary logistic regression. Various goodness of fit comparison criteria such as Chi Square, Akaike’s Information Criterion (AIC) and Receiver Operating Characteristic Curve ROC are deployed to identify the best performing models. Experimental outcomes based on the above criteria also show that the proposed hybrid framework outperforms other existing models of signature types including independent wrapper and filter approaches to identify malware.
Hybrids of support vector machine wrapper and filter based framework for malware detection
- Authors: Huda, Shamsul , Abawajy, Jemal , Alazab, Mamoun , Abdollalihiand, Mali , Islam, Rafiqul , Yearwood, John
- Date: 2016
- Type: Text , Journal article
- Relation: Future Generation Computer Systems Vol. 55, no. (2016), p. 376-390
- Full Text: false
- Reviewed:
- Description: Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation Anti-Virus (AV) engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current AV engines in detecting malware. In this paper we propose a hybrid framework for malware detection by using the hybrids of Support Vector Machines Wrapper, Maximum-Relevance–Minimum-Redundancy Filter heuristics where Application Program Interface (API) call statistics are used as a malware features. The novelty of our hybrid framework is that it injects the filter’s ranking score in the wrapper selection process and combines the properties of both wrapper and filters and API call statistics which can detect malware based on the nature of infectious actions instead of signature. To the best of our knowledge, this kind of hybrid approach has not been explored yet in the literature in the context of feature selection and malware detection. Knowledge about the intrinsic characteristics of malicious activities is determined by the API call statistics which is injected as a filter score into the wrapper’s backward elimination process in order to find the most significant APIs. While using the most significant APIs in the wrapper classification on both obfuscated and benign types malware datasets, the results show that the proposed hybrid framework clearly surpasses the existing models including the independent filters and wrappers using only a very compact set of significant APIs. The performances of the proposed and existing models have further been compared using binary logistic regression. Various goodness of fit comparison criteria such as Chi Square, Akaike’s Information Criterion (AIC) and Receiver Operating Characteristic Curve ROC are deployed to identify the best performing models. Experimental outcomes based on the above criteria also show that the proposed hybrid framework outperforms other existing models of signature types including independent wrapper and filter approaches to identify malware.
A hybrid wrapper-filter approach to detect the source(s) of out-of-control signals in multivariate manufacturing process
- Authors: Huda, Shamsul , Abdollahian, Mali , Mammadov, Musa , Yearwood, John , Ahmed, Shafiq , Sultan, Ibrahim
- Date: 2014
- Type: Text , Journal article
- Relation: European Journal of Operational Research Vol. 237, no. 3 (2014), p. 857-870
- Full Text: false
- Reviewed:
- Description: With modern data-Acquisition equipment and on-line computers used during production, it is now common to monitor several correlated quality characteristics simultaneously in multivariate processes. Multivariate control charts (MCC) are important tools for monitoring multivariate processes. One difficulty encountered with multivariate control charts is the identification of the variable or group of variables that cause an out-of-control signal. Expert knowledge either in combination with wrapper-based supervised classifier or a pre-filter with wrapper are the standard approaches to detect the sources of out-of-control signal. However gathering expert knowledge in source identification is costly and may introduce human error. Individual univariate control charts (UCC) and decomposition of T2 statistics are also used in many cases simultaneously to identify the sources, but these either ignore the correlations between the sources or may take more time with the increase of dimensions. The aim of this paper is to develop a source identification approach that does not need any expert-knowledge and can detect out-of-control signal in less computational complexity. We propose, a hybrid wrapper-filter based source identification approach that hybridizes a Mutual Information (MI) based Maximum Relevance (MR) filter ranking heuristic with an Artificial Neural Network (ANN) based wrapper. The Artificial Neural Network Input Gain Measurement Approximation (ANNIGMA) has been combined with MR (MR-ANNIGMA) to utilize the knowledge about the intrinsic pattern of the quality characteristics computed by the filter for directing the wrapper search process. To compute optimal ANNIGMA score, we also propose a Global MR-ANNIGMA using non-functional relationship between variables which is independent of the derivative of the objective function and has a potential to overcome the local optimization problem of ANN training. The novelty of the proposed approaches is that they combine the advantages of both filter and wrapper approaches and do not require any expert knowledge about the sources of the out-of-control signals. Heuristic score based subset generation process also reduces the search space into polynomial growth which in turns reduces computational time. The proposed approaches were tested by exhaustive experiments using both simulated and real manufacturing data and compared to existing methods including independent filter, wrapper and Multivariate EWMA (MEWMA) methods. The results indicate that the proposed approaches can identify the sources of out-of-control signals more accurately than existing approaches. © 2014 Elsevier B.V. All rights reserved.
Hybrid metaheuristic approaches to the expectation maximization for estimation of the hidden markov model for signal modeling
- Authors: Huda, Shamsul , Yearwood, John , Togneri, Roberto
- Date: 2014
- Type: Text , Journal article
- Relation: IEEE Transactions on Cybernetics Vol. 44, no. 10 (2014), p. 1962-1977
- Full Text: false
- Reviewed:
- Description: The expectation maximization (EM) is the standard training algorithm for hidden Markov model (HMM). However, EM faces a local convergence problem in HMM estimation. This paper attempts to overcome this problem of EM and proposes hybrid metaheuristic approaches to EM for HMM. In our earlier research, a hybrid of a constraint-based evolutionary learning approach to EM (CEL-EM) improved HMM estimation. In this paper, we propose a hybrid simulated annealing stochastic version of EM (SASEM) that combines simulated annealing (SA) with EM. The novelty of our approach is that we develop a mathematical reformulation of HMM estimation by introducing a stochastic step between the EM steps and combine SA with EM to provide better control over the acceptance of stochastic and EM steps for better HMM estimation. We also extend our earlier work [1] and propose a second hybrid which is a combination of an EA and the proposed SASEM, (EA-SASEM). The proposed EA-SASEM uses the best constraint-based EA strategies from CEL-EM and stochastic reformulation of HMM. The complementary properties of EA and SA and stochastic reformulation of HMM of SASEM provide EA-SASEM with sufficient potential to find better estimation for HMM. To the best of our knowledge, this type of hybridization and mathematical reformulation have not been explored in the context of EM and HMM training. The proposed approaches have been evaluated through comprehensive experiments to justify their effectiveness in signal modeling using the speech corpus: TIMIT. Experimental results show that proposed approaches obtain higher recognition accuracies than the EM algorithm and CEL-EM as well. © 2014 IEEE.
Performance evaluation of multivariate non-normal process using metaheuristic approaches
- Authors: Ahmad, S. , Abdollahian, Mali , Bhatti, M.I. , Huda, Shamsul , Yearwood, John
- Date: 2014
- Type: Text , Journal article
- Relation: Journal of Applied Statistical Science Vol. 20, no. 3 (2014), p. 299-315
- Full Text: false
- Reviewed:
- Description: Multivariate process performance indices generally rely on the assumption that the process follow normal distribution but in practice its non-normal with correlated characteristics patterns. This paper proposes two metaheuristic-based approaches to fit Burr distribution to such data; a single candidate model based approach using a Simulated Annealing (SA) technique and a population based approach using a constraint-based Evolutionary Alogorithn (EA). The fitted Burr distribution is then used to estimate the proportion of Non-conforming (PNC) which is then used to fit an appropiate Burr distribution to individual Geometric distance variables. Empirical performance of the proposed methods have been evaluated on real industrial data set using PNC criterion. Experimental results demonstrate that the new approach perform well than the existing.
A reinforcement learning approach with spline-fit object tracking for AIBO Robot's high level decision making
- Authors: Mukherjee, Subhasis , Huda, Shamsul , Yearwood, John
- Date: 2011
- Type: Text , Book chapter
- Relation: Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing p. 169-183
- Full Text: false
- Reviewed:
- Description: Robocup is a popular test bed for AI programs around the world. Robosoccer is one of the two major parts of Robocup, in which AIBO entertainment robots take part in the middle sized soccer event. The three key challenges that robots need to face in this event are manoeuvrability, image recognition and decision making skills. This paper focuses on the decision making problem in Robosoccer-The goal keeper problem. We investigate whether reinforcement learning (RL) as a form of semi-supervised learning can effectively contribute to the goal keeper's decision making process when penalty shot and two attacker problem are considered. Currently, the decision making process in Robosoccer is carried out using rule-base system. RL also is used for quadruped locomotion and navigation purpose in Robosoccer using AIBO. Moreover the ball distance is being calculated using IR sensors available at the nose of the robot. In this paper, we propose a reinforcement learning based approach that uses a dynamic state-action mapping using back propagation of reward and Q-learning along with spline fit (QLSF) for the final choice of high level functions in order to save the goal. The novelty of our approach is that the agent learns while playing and can take independent decision which overcomes the limitations of rule-base system due to fixed and limited predefined decision rules. The spline fit method used with the nose camera was also able to find out the location and the ball distance more accurately compare to the IR sensors. The noise source and near and far sensor dilemma problem with IR sensor was neutralized using the proposed spline fit method. Performance of the proposed method has been verified against the bench mark data set made with Upenn'03 code logic and a base line experiment with IR sensors. It was found that the efficiency of our QLSF approach in goalkeeping was better than the rule based approach in conjunction with the IR sensors. The QLSF develops a semi-supervised learning process over the rule-base system's input-output mapping process, given in the Upenn'03 code. © 2011 Springer-Verlag Berlin Heidelberg.
Reinforcement learning approach to AIBO robot's decision making process in Robosoccer's goal keeper problem
- Authors: Mukherjee, Subhasis , Yearwood, John , Vamplew, Peter , Huda, Shamsul
- Date: 2011
- Type: Text , Conference proceedings
- Full Text: false
- Description: Robocup is a popular test bed for AI programs around the world. Robosoccer is one of the two major parts of Robocup, in which AIBO entertainment robots take part in the middle sized soccer event. The three key challenges that robots need to face in this event are manoeuvrability, image recognition and decision making skills. This paper focuses on the decision making problem in Robosoccer - The goal keeper problem. We investigate whether reinforcement learning (RL) as a form of semi-supervised learning can effectively contribute to the goal keeper's decision making process when penalty shot and two attacker problem are considered. Currently, the decision making process in Robosoccer is carried out using rule-base system. RL also is used for quadruped locomotion and navigation purpose in Robosoccer using AIBO. In this paper, we propose a reinforcement learning based approach that uses a dynamic state-action mapping using back propagation of reward and space quantized Q-learning (SQQL) for the choice of high level functions in order to save the goal. The novelty of our approach is that the agent learns while playing and can take independent decision which overcomes the limitations of rule-base system due to fixed and limited predefined decision rules. Performance of the proposed method has been verified against the bench mark data set made with Upenn'03 code logic. It was found that the efficiency of our SQQL approach in goalkeeping was better than the rule based approach. The SQQL develops a semi-supervised learning process over the rule-base system's input-output mapping process, given in the Upenn'03 code. © 2011 IEEE.
Cluster based rule discovery model for enhancement of government's tobacco control strategy
- Authors: Huda, Shamsul , Yearwood, John , Borland, Ron
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Discovery of interesting rules describing the behavioural patterns of smokers' quitting intentions is an important task in the determination of an effective tobacco control strategy. In this paper, we investigate a compact and simplified rule discovery process for predicting smokers' quitting behaviour that can provide feedback to build an scientific evidence-based adaptive tobacco control policy. Standard decision tree (SDT) based rule discovery depends on decision boundaries in the feature space which are orthogonal to the axis of the feature of a particular decision node. This may limit the ability of SDT to learn intermediate concepts for high dimensional large datasets such as tobacco control. In this paper, we propose a cluster based rule discovery model (CRDM) for generation of more compact and simplified rules for the enhancement of tobacco control policy. The clusterbased approach builds conceptual groups from which a set of decision trees (a decision forest) are constructed. Experimental results on the tobacco control data set show that decision rules from the decision forest constructed by CRDM are simpler and can predict smokers' quitting intention more accurately than a single decision tree. © 2010 IEEE.
Exploring novel features and decision rules to identify cardiovascular autonomic neuropathy using a hybrid of wrapper-filter based feature selection
- Authors: Huda, Shamsul , Jelinek, Herbert , Ray, Biplob , Stranieri, Andrew , Yearwood, John
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at the 2010 6th International Conference on Intelligent Sensors, Sensor Networks and Information Processing, ISSNIP 2010 p. 297-302
- Full Text:
- Reviewed:
- Description: Cardiovascular autonomic neuropathy (CAN) is one of the important causes of mortality among diabetes patients. Statistics shows that more than 22% of people with type 2 diabetes mellitus suffer from CAN and which in turn leads to cardiovascular disease (heart attack, stroke). Therefore early detection of CAN could reduce the mortality. Traditional method for detection of CAN uses Ewing's algorithm where five noninvasive cardiovascular tests are used. Often for clinician, it is difficult to collect data from for the Ewing Battery patients due to onerous test conditions. In this paper, we propose a hybrid of wrapper-filter approach to find novel features from patients' ECG records and then generate decision rules for the new features for easier detection of CAN. In the proposed feature selection, a hybrid of filter (Maximum Relevance, MR) and wrapper (Artificial Neural Net Input Gain Measurement Approximation ANNIGMA) approaches (MR-ANNIGMA) would be used. The combined heuristics in the hybrid MRANNIGMA takes the advantages of the complementary properties of the both filter and wrapper heuristics and can find significant features. The selected features set are used to generate a new set of rules for detection of CAN. Experiments on real patient records shows that proposed method finds a smaller set of features for detection of CAN than traditional method which are clinically significant and could lead to an easier way to diagnose CAN. © 2010 IEEE.
Hybrid wrapper-filter approaches for input feature selection using maximum relevance and Artificial Neural Network Input Gain Measurement Approximation (ANNIGMA)
- Authors: Huda, Shamsul , Yearwood, John , Stranieri, Andrew
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Feature selection is an important research problem in machine learning and data mining applications. This paper proposes a hybrid wrapper and filter feature selection algorithm by introducing the filter's feature ranking score in the wrapper stage to speed up the search process for wrapper and thereby finding a more compact feature subset. The approach hybridizes a Mutual Information (MI) based Maximum Relevance (MR) filter ranking heuristic with an Artificial Neural Network (ANN) based wrapper approach where Artificial Neural Network Input Gain Measurement Approximation (ANNIGMA) has been combined with MR (MR-ANNIGMA) to guide the search process in the wrapper. The novelty of our approach is that we use hybrid of wrapper and filter methods that combines filter's ranking score with the wrapper-heuristic's score to take advantages of both filter and wrapper heuristics. Performance of the proposed MRANNIGMA has been verified using bench mark data sets and compared to both independent filter and wrapper based approaches. Experimental results show that MR-ANNIGMA achieves more compact feature sets and higher accuracies than both filter and wrapper approaches alone. © 2010 IEEE.
Smokers' characteristics and cluster based quitting rule discovery model for enhancement of government's tobacco control systems
- Authors: Huda, Shamsul , Yearwood, John , Borland, Ron
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of the 14th Pacific Asia Conference on Information Systems (PACIS 2010)
- Full Text:
- Reviewed:
- Description: Discovery of cluster characteristics and interesting rules describing smokers' clusters and the behavioural patterns of smoker's quitting intentions is an important task in the development of an effective tobacco control systems. In this paper, we attempt to determine the characteristics smokers' cluster and simplified rule for predicting smokers' quitting behaviour that can provide feedback to build a scientific evidence-based adaptive tobacco control systems. Standard clustering algorithm groups the data based on there inherent pattern. "From abstract"
- Description: Discovery of cluster characteristics and interesting rules describing smokers' clusters and the behavioural patterns of smoker's quiiting intentios is an important task in the development of an effective tobacco control systems. In this paper, we attempt to determine the characteristics smokers' cluster and simplified rule for predicting smokers' quitting behaviour that can provide feedback to build a scientific evidence-based adaptive tobacco control systems. Standard clustering algorithm groups the data based on there inherent pattern. "From abstract"
A constraint-based evolutionary learning approach to the expectation maximization for optimal estimation of the hidden Markov model for speech signal modeling
- Authors: Huda, Shamsul , Yearwood, John , Togneri, Roberto
- Date: 2009
- Type: Text , Journal article
- Relation: IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics Vol. 39, no. 1 (2009), p. 182-197
- Full Text:
- Reviewed:
- Description: This paper attempts to overcome the tendency of the expectation-maximization (EM) algorithm to locate a local rather than global maximum when applied to estimate the hidden Markov model (HMM) parameters in speech signal modeling. We propose a hybrid algorithm for estimation of the HMM in automatic speech recognition (ASR) using a constraint-based evolutionary algorithm (EA) and EM, the CEL-EM. The novelty of our hybrid algorithm (CEL-EM) is that it is applicable for estimation of the constraint-based models with many constraints and large numbers of parameters (which use EM) like HMM. Two constraint-based versions of the CEL-EM with different fusion strategies have been proposed using a constraint-based EA and the EM for better estimation of HMM in ASR. The first one uses a traditional constraint-handling mechanism of EA. The other version transforms a constrained optimization problem into an unconstrained problem using Lagrange multipliers. Fusion strategies for the CEL-EM use a staged-fusion approach where EM has been plugged with the EA periodically after the execution of EA for a specific period of time to maintain the global sampling capabilities of EA in the hybrid algorithm. A variable initialization approach (VIA) has been proposed using a variable segmentation to provide a better initialization for EA in the CEL-EM. Experimental results on the TIMIT speech corpus show that CEL-EM obtains higher recognition accuracies than the traditional EM algorithm as well as a top-standard EM (VIA-EM, constructed by applying the VIA to EM). © 2008 IEEE.
A stochastic version of Expectation Maximization algorithm for better estimation of Hidden Markov Model
- Authors: Huda, Shamsul , Yearwood, John , Togneri, Roberto
- Date: 2009
- Type: Text , Journal article
- Relation: Pattern Recognition Letters Vol. 30, no. 14 (2009), p. 1301-1309
- Full Text: false
- Reviewed:
- Description: This paper attempts to overcome the local convergence problem of the Expectation Maximization (EM) based training of the Hidden Markov Model (HMM) in speech recognition. We propose a hybrid algorithm, Simulated Annealing Stochastic version of EM (SASEM), combining Simulated Annealing with EM that reformulates the HMM estimation process using a stochastic step between the EM steps and the SA. The stochastic processes of SASEM inside EM can prevent EM from converging to a local maximum and find improved estimation for HMM using the global convergence properties of SA. Experiments on the TIMIT speech corpus show that SASEM obtains higher recognition accuracies than the EM. © 2009 Elsevier B.V. All rights reserved.
A variable initialization approach to the EM algorithm for better estimation of the parameters of hidden Markov Model based acoustic modeling of speech signals
- Authors: Huda, Shamsul , Ghosh, Ranadhir , Yearwood, John
- Date: 2006
- Type: Text , Conference paper
- Relation: Paper presented at Artificial Intelligence, Advances in Data Mining, Applications in Medicine, Web Mining, Marketing, Image and Signal Mining Conference 2006, Leipzig, Germany : 14th July, 2006 p. 416-430
- Full Text: false
- Reviewed:
- Description: The traditional method for estimation of the parameters of Hidden Markov Model (HMM) based acoustic modeling of speech uses the Expectation-Maximization (EM) algorithm. The EM algorithm is sensitive to initial values of HMM parameters and is likely to terminate at a local maximum of likelihood function resulting in non-optimized estimation for HMM and lower recognition accuracy. In this paper, to obtain better estimation for HMM and higher recognition accuracy, several candidate HMMs are created by applying EM on multiple initial models. The best HMM is chosen from the candidate HMMs which has highest value for likelihood function. Initial models are created by varying maximum frame number in the segmentation step of HMM initialization process. A binary search is applied while creating the initial models. The proposed method has been tested on TIMIT database. Experimental results show that our approach obtains improved values for likelihood function and improved recognition accuracy.
- Description: E1
- Description: 2003001542
A Hybrid algorithm for estimation of the parameters of Hidden Markov Model based acoustic modeling of speech signals using constraint-based genetic algorithm and expectation maximization
- Authors: Ghosh, Ranadhir , Huda, Shamsul , Yearwood, John
- Date: 2005
- Type: Text , Conference paper
- Relation: Paper presented at the Workshop in Learning Algorithms for Pattern Recognition, in conjunction with the 18th Australian Joint Conference on Artificial Intelligence, Sydney : 5th December, 2005
- Full Text: false
- Reviewed:
- Description: E1
- Description: 2003001368