Establishing phishing provenance using orthographic features
- Authors: Liping, Ma , Yearwood, John , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 eCrime Researchers Summit, eCRIME '09, Tacoma, Washington : 20th-21st October 2009
- Full Text:
- Description: After phishing message detection, determining the provenance of phishing messages and Websites is the second step to tracing cybercriminals. In this paper, we present a novel method to cluster phishing emails automatically using orthographic features. In particular, we develop an algorithm to cluster documents and remove redundant features at the same time. After collecting all the possible features based on observation, we adapt the modified global k-mean method repeatedly, and generate the objective function values over a range of tolerance values across different subsets of features. Finally, we identify the appropriate clusters based on studying the distribution of the objective function values. Experimental evaluation of a large number of computations demonstrates that our clustering and feature selection techniques are highly effective and achieve reliable results.
- Description: 2003007842
Feature selection using misclassification counts
- Authors: Bagirov, Adil , Yatsko, Andrew , Stranieri, Andrew
- Date: 2011
- Type: Conference proceedings , Unpublished work
- Relation: Proceedings of the 9th Australasian Data Mining Conference (AusDM 2011), 51-62. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 121.
- Full Text:
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and instance acquisition effort, considering all the data attributes accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance and redundancy, what ranking does not immediately decide. Additionally, feature ranking methods from different independent sources are called in for the direct comparison.
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and the data acquisition effort, considering all data components being accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree, to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance, what ranking does not immediately decide. Additionally, feature ranking methods available from different independent sources are called in for direct comparison.
Predicting Australian stock market index using neural networks exploiting dynamical swings and intermarket influences
- Authors: Pan, Heping , Tilakaratne, Chandima , Yearwood, John
- Date: 2005
- Type: Text , Journal article
- Relation: Journal of Research and Practice in Information Technology Vol. 37, no. 1 (2005), p. 43-55
- Full Text:
- Reviewed:
- Description: This paper presents a computational approach for predicting the Australian stock market index AORD using multi-layer feed-forward neural networks front the time series data of AORD and various interrelated markets. This effort aims to discover an effective neural network, or a set of adaptive neural networks for this prediction purpose, which can exploit or model various dynamical swings and inter-market influences discovered from professional technical analysis and quantitative analysis. Within a limited range defined by our empirical knowledge, three aspects of effectiveness on data selection are considered: effective inputs from the target market (AORD) itself, a sufficient set of interrelated markets,. and effective inputs from the interrelated markets. Two traditional dimensions of the neural network architecture are also considered: the optimal number of hidden layers, and the optimal number of hidden neurons for each hidden layer. Three important results were obtained: A 6-day cycle was discovered in the Australian stock market during the studied period; the time signature used as additional inputs provides useful information; and a basic neural network using six daily returns of AORD and one daily, returns of SP500 plus the day of the week as inputs exhibits up to 80% directional prediction correctness.
- Description: C1
- Description: 2003001440
Hybrid wrapper-filter approaches for input feature selection using maximum relevance and Artificial Neural Network Input Gain Measurement Approximation (ANNIGMA)
- Authors: Huda, Shamsul , Yearwood, John , Stranieri, Andrew
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Feature selection is an important research problem in machine learning and data mining applications. This paper proposes a hybrid wrapper and filter feature selection algorithm by introducing the filter's feature ranking score in the wrapper stage to speed up the search process for wrapper and thereby finding a more compact feature subset. The approach hybridizes a Mutual Information (MI) based Maximum Relevance (MR) filter ranking heuristic with an Artificial Neural Network (ANN) based wrapper approach where Artificial Neural Network Input Gain Measurement Approximation (ANNIGMA) has been combined with MR (MR-ANNIGMA) to guide the search process in the wrapper. The novelty of our approach is that we use hybrid of wrapper and filter methods that combines filter's ranking score with the wrapper-heuristic's score to take advantages of both filter and wrapper heuristics. Performance of the proposed MRANNIGMA has been verified using bench mark data sets and compared to both independent filter and wrapper based approaches. Experimental results show that MR-ANNIGMA achieves more compact feature sets and higher accuracies than both filter and wrapper approaches alone. © 2010 IEEE.
Nonsmooth optimisation approach to data classification
- Authors: Bagirov, Adil , Soukhoroukova, Nadejda
- Date: 2001
- Type: Text , Conference paper
- Relation: Paper presented at Post-graduate ADFA Conference for Computer Science, PACCS01, Canberra, Australian Capital Territory : 14th July 2001
- Full Text:
- Description: We reduce the supervised classification to solving a nonsmooth optimization problem. The proposed method allows one to solve classification problems for databases with arbitrary number of classes. Numerical experiments have been carried out with databases of small and medium size. We present their results and provide comparison of these results with ones obtained by other algorithms of classification based on the optimization techniques. Results of numerical experiments show effectiveness of the proposed algorithms.
- Description: 2003003668
Application of optimisation-based data mining techniques to tobacco control dataset
- Authors: Dzalilov, Zari , Zhang, J , Bagirov, Adil , Mammadov, Musa
- Date: 2010
- Type: Text , Journal article
- Relation: International Journal of Lean Thinking Vol. 1, no. 1 (2010), p. 27-41
- Full Text: false
- Reviewed:
- Description: Tobacco smoking is one of the leading causes of death around the world. Consequently, control of tobacco use is an important global public health issue. Tobacco control may be aided by development of theoretical and methodological frameworks for describing and understanding complex tobacco control systems. Linear regression and logistic regression are currently very popular statistical techniques for modeling and analyzing complex data in tobacco control systems. However, in tobacco markets, numerous interrelated factors nontrivially interact with tobacco control policies, such that policies and control outcomes are nonlinearly related.
Using meta-regression data mining to improve predictions of performance based on heart rate dynamics for Australian football
- Authors: Jelinek, Herbert , Kelarev, Andrei , Robinson, Dean , Stranieri, Andrew , Cornforth, David
- Date: 2014
- Type: Text , Journal article
- Relation: Applied Soft Computing Vol. 14, no. PART A (2014), p. 81-87
- Full Text: false
- Reviewed:
- Description: This work investigates the effectiveness of using computer-based machine learning regression algorithms and meta-regression methods to predict performance data for Australian football players based on parameters collected during daily physiological tests. Three experiments are described. The first uses all available data with a variety of regression techniques. The second uses a subset of features selected from the available data using the Random Forest method. The third used meta-regression with the selected feature subset. Our experiments demonstrate that feature selection and meta-regression methods improve the accuracy of predictions for match performance of Australian football players based on daily data of medical tests, compared to regression methods alone. Meta-regression methods and feature selection were able to obtain performance prediction outcomes with significant correlation coefficients. The best results were obtained by the additive regression based on isotonic regression for a set of most influential features selected by Random Forest. This model was able to predict athlete performance data with a correlation coefficient of 0.86 (p < 0.05). © 2013 Published by Elsevier B.V. All rights reserved.
- Description: C1
Neural networks for detection and classification of walking pattern changes due to ageing
- Authors: Begg, Rezaul , Kamruzzaman, Joarder
- Date: 2006
- Type: Text , Journal article
- Relation: Australasian Physical & Engineering Sciences in Medicine Vol. 29, no. 2 (2006), p. 188-195
- Full Text: false
- Reviewed:
- Description: With age, gait functions reflected in the walking patterns degenerate and threaten the balance control mechanisms of the locomotor system. The aim of this paper is to explore applications of artificial neural networks for automated recognition of gait changes due to ageing from their respective gait-pattern characteristics. The ability of such discrimination has many advantages including the identification of at-risk or faulty gait. Various gait features (e.g., temporal-spatial, footground reaction forces and lower limb joint angular data) were extracted from 12 young and 12 elderly participants during normal walking and these were utilized for training and testing on three neural network algorithms (Standard Backpropagation; Scaled Conjugate Gradient; and Backpropagation with Bayesian Regularization, BR). Receiver operating characteristics plots, sensitivity and specificity results as well as accuracy rates were used to evaluate performance of the three classifiers. Cross-validation test results indicate a maximum generalization performance of 83.3% in the recognition of the young and elderly gait patterns. Out of the three neural network algorithms, BR performed superiorly in the test results with best sensitivity, selectivity and detection rates. With the help of a feature selection technique, the maximum classification accuracy of the BR attained 100%, when trained with a small subset of selected gait features. The results of this study demonstrate the capability of neural networks in the detection of gait changes with ageing and their potentials for future applications as gait diagnostics.
A class centric feature and classifier ensemble selection approach for music genre classification
- Authors: Ariyaratne, Hasitha Bimsara , Zhang, Dengsheng , Lu, Guojun
- Date: 2012
- Type: Text , Conference paper
- Relation: Joint IAPR International Workshop SSPR & SPR 2012 p. 666-674
- Full Text: false
- Reviewed:
- Description: Music genre classification has attracted a lot of research interest due to the rapid growth of digital music. Despite the availability of a vast number of audio features and classification techniques, genre classification still remains a challenging task. In this work we propose a class centric feature and classifier ensemble selection method which deviates from the conventional practice of employing a single, or an ensemble of classifiers trained with a selected set of audio features. We adopt a binary decomposition technique to divide the multiclass problem into a set of binary problems which are then treated in a class specific manner. This differs from the traditional techniques which operate on the naive assumption that a specific set of features and/or classifiers can perform equally well in identifying all the classes. Experimental results obtained on a popular genre dataset and a newly created dataset suggest significant improvements over traditional techniques.
The photometric stereo approach and the visualization of 3D face reconstruction
- Authors: Khan, Muhammad , Ullah, Zabeeh , Butt, Maria , Arshad, Zohaib , Yousaf, Sobia
- Date: 2019
- Type: Text , Journal article
- Relation: International Journal of Advanced Computer Science and Applications Vol. 10, no. 2 (2019), p. 217-221
- Full Text:
- Reviewed:
- Description: The 3D Morphable models of the human face have prepared myriad of applications in computer vision, human computer interaction and security surveillances. However, due to the variation in size, complexity of training data set, the landmark mapping, the representation in real time and rendering or synthesis of images in three dimensional is limited. In this paper, we extend the approach of the photometric stereo and provide the human face reconstruction in three dimensional. The proposed method consists of two steps. First it automatically detects the face and segment the iris along with statistical features of pupil location in it. Secondly it provides the selection of minimum six features and where iris process to generate the 3D face. In compare with existing methods our approach provides the automation which produces more better and efficient results in contrast to the manual methods.
A machine learning approach for prediction of pregnancy outcome following IVF treatment
- Authors: Hassan, Md Rafiul , Al-Insaif, Sadiq , Hossain, Muhammad , Kamruzzaman, Joarder
- Date: 2020
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 32, no. 7 (2020), p. 2283-2297
- Full Text: false
- Reviewed:
- Description: Infertility affects one out of seven couples around the world. Therefore, the best possible management of the in vitro fertilization (IVF) treatment and patient advice is crucial for both patients and medical practitioners. The ultimate concern of the patients is the success of an IVF procedure, which depends on a number of influencing attributes. Without any automated tool, it is hard for the practitioners to assess any influencing trend of the attributes and factors that might lead to a successful IVF pregnancy. This paper proposes a hill climbing feature (attribute) selection algorithm coupled with automated classification using machine learning techniques with the aim to analyze and predict IVF pregnancy in greater accuracy. Using 25 attributes, we assessed the prediction ability of IVF pregnancy success for five different machine learning models, namely multilayer perceptron (MLP), support vector machines (SVM), C4.5, classification and regression trees (CART) and random forest (RF). The prediction ability was measured in terms of widely used performance metrics, namely accuracy rate, F-measure and AUC. Feature selection algorithm reduced the number of most influential attributes to nineteen for MLP, sixteen for RF, seventeen for SVM, twelve for C4.5 and eight for CART. Overall, the most influential attributes identified are: ‘age’, ‘indication’ of fertility factor, ‘Antral Follicle Counts (AFC)’, ‘NbreM2’, ‘method of sperm collection’, ‘Chamotte’, ‘Fertilization rate in vitro’, ‘Follicles on day 14’ and ‘Embryo transfer day.’ The machine learning models trained with the selected set of features significantly improved the prediction accuracy of IVF pregnancy success to a level considerably higher than those reported in the current literature. © 2018, The Natural Computing Applications Forum.
The gene of scientific success
- Authors: Kong, Xiangjie , Zhang, Jun , Zhang, Da , Bu, Yi , Ding, Ying , Xia, Feng
- Date: 2020
- Type: Text , Journal article
- Relation: ACM Transactions on Knowledge Discovery from Data Vol. 14, no. 4 (2020), p.
- Full Text:
- Reviewed:
- Description: This article elaborates how to identify and evaluate causal factors to improve scientific impact. Currently, analyzing scientific impact can be beneficial to various academic activities including funding application, mentor recommendation, discovering potential cooperators, and the like. It is universally acknowledged that high-impact scholars often have more opportunities to receive awards as an encouragement for their hard work. Therefore, scholars spend great efforts in making scientific achievements and improving scientific impact during their academic life. However, what are the determinate factors that control scholars' academic success? The answer to this question can help scholars conduct their research more efficiently. Under this consideration, our article presents and analyzes the causal factors that are crucial for scholars' academic success. We first propose five major factors including article-centered factors, author-centered factors, venue-centered factors, institution-centered factors, and temporal factors. Then, we apply recent advanced machine learning algorithms and jackknife method to assess the importance of each causal factor. Our empirical results show that author-centered and article-centered factors have the highest relevancy to scholars' future success in the computer science area. Additionally, we discover an interesting phenomenon that the h-index of scholars within the same institution or university are actually very close to each other. © 2020 ACM.
Interactions between fecal gut microbiome, enteric pathogens, and energy regulating hormones among acutely malnourished rural Gambian children
- Authors: Nabwera, Helen , Espinoza, Josh , Worwui, Archibald , Betts, Modupeh , Bradbury, Richard
- Date: 2021
- Type: Text , Journal article
- Relation: EBioMedicine Vol. 73, no. (2021), p.
- Full Text:
- Reviewed:
- Description: Background: The specific roles that gut microbiota, known pathogens, and host energy-regulating hormones play in the pathogenesis of non-edematous severe acute malnutrition (marasmus SAM) and moderate acute malnutrition (MAM) during outpatient nutritional rehabilitation are yet to be explored. Methods: We applied an ensemble of sample-specific (intra- and inter-modality) association networks to gain deeper insights into the pathogenesis of acute malnutrition and its severity among children under 5 years of age in rural Gambia, where marasmus SAM is most prevalent. Findings: Children with marasmus SAM have distinct microbiome characteristics and biologically-relevant multimodal biomarkers not observed among children with moderate acute malnutrition. Marasmus SAM was characterized by lower microbial richness and biomass, significant enrichments in Enterobacteriaceae, altered interactions between specific Enterobacteriaceae and key energy regulating hormones and their receptors. Interpretation: Our findings suggest that marasmus SAM is characterized by the collapse of a complex system with nested interactions and key associations between the gut microbiome, enteric pathogens, and energy regulating hormones. Further exploration of these systems will help inform innovative preventive and therapeutic interventions. Funding: The work was supported by the UK Medical Research Council (MRC; MC-A760-5QX00) and the UK Department for International Development (DFID) under the MRC/DFID Concordat agreement; Bill and Melinda Gates Foundation (OPP 1066932) and the National Institute of Medical Research (NIMR), UK. This network analysis was supported by NIH U54GH009824 [CLD] and NSF OCE-1558453 [CLD]. © 2021 The Author(s). **Please note that there are multiple authors for this article therefore only the name of the first 5 including Federation University Australia affiliate “Richard Bradbury" is provided in this record**
Evaluating rehabilitation progress using motion features identified by machine learning
- Authors: Lu, Lei , Tan, Ying , Klaic, Marlena , Galea, Mary , Khan, Fary , Oliver, Annie , Mareels, Iven , Oetomo, Denny , Zhao, Erying
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Trans Biomed Eng Vol. 68, no. 4 (2021), p. 1417-1428
- Full Text: false
- Reviewed:
- Description: Evaluating progress throughout a patient's rehabilitation episode is critical for determining the effectiveness of the selected treatments and is an essential ingredient in personalised and evidence-based rehabilitation practice. The evaluation process is complex due to the inherently large human variations in motor recovery and the limitations of commonly used clinical measurement tools. Information recorded during a robot-assisted rehabilitation process can provide an effective means to continuously quantitatively assess movement performance and rehabilitation progress. However, selecting appropriate motion features for rehabilitation evaluation has always been challenging. This paper exploits unsupervised feature learning techniques to reduce the complexity of building the evaluation model of patients' progress. A new feature learning technique is developed to select the most significant features from a large amount of kinematic features measured from robotics, providing clinically useful information to health practitioners with reduction of modeling complexity. A novel indicator that uses monotonicity and trendability is proposed to evaluate kinematic features. The data used to develop the feature selection technique consist of kinematic data from robot-aided rehabilitation for a population of stroke patients. The selected kinematic features allow for human variations across a population of patients as well as over the sequence of rehabilitation sessions. The study is based on data records pertaining to 41 stroke patients using three different robot assisted exercises for upper limb rehabilitation. Consistent with the literature, the results indicate that features based on movement smoothness are the best measures among 17 kinematic features suitable to evaluate rehabilitation progress.
Cancer classification utilizing voting classifier with ensemble feature selection method and transcriptomic data
- Authors: Khatun, Rabea , Akter, Maksuda , Islam, Md Manowarul , Uddin, Md Ashraf , Talukder, Md Alamin , Kamruzzaman, Joarder , Azad, Akm , Paul, Bikash , Almoyad, Muhammad , Aryal, Sunil , Moni, Mohammad
- Date: 2023
- Type: Text , Journal article
- Relation: Genes Vol. 14, no. 9 (2023), p.
- Full Text:
- Reviewed:
- Description: Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods. © 2023 by the authors.
Proposed machine learning techniques for bridge structural health monitoring : a laboratory study
- Authors: Noori Hoshyar, Azadeh , Rashidi, Maria , Yu, Yang , Samali, Bijan
- Date: 2023
- Type: Text , Journal article
- Relation: Remote Sensing Vol. 15, no. 8 (2023), p.
- Full Text:
- Reviewed:
- Description: Structural health monitoring for bridges is a crucial concern in engineering due to the degradation risks caused by defects, which can become worse over time. In this respect, enhancement of various models that can discriminate between healthy and non-healthy states of structures have received extensive attention. These models are concerned with implementation algorithms, which operate on the feature sets to quantify the bridge’s structural health. The functional correlation between the feature set and the health state of the bridge structure is usually difficult to define. Therefore, the models are derived from machine learning techniques. The use of machine learning approaches provides the possibility of automating the SHM procedure and intelligent damage detection. In this study, we propose four classification algorithms to SHM, which uses the concepts of support vector machine (SVM) algorithm. The laboratory experiment, which intended to validate the results, was performed at Western Sydney University (WSU). The results were compared with the basic SVM to evaluate the performance of proposed algorithms. © 2023 by the authors.
Filter feature selection based boolean modelling for genetic network inference
- Authors: Gamage, Hasini , Chetty, Madhu , Shatte, Adrian , Hallinan, Jennifer
- Date: 2022
- Type: Text , Journal article
- Relation: BioSystems Vol. 221, no. (2022), p.
- Full Text:
- Reviewed:
- Description: The reconstruction of Gene Regulatory Networks (GRNs) from time series gene expression data is highly relevant for the discovery of complex biological interactions and dynamics. Various computational strategies have been developed for this task, but most approaches have low computational efficiency and are not able to cope with high-dimensional, low sample-number, gene expression data. In this paper, we introduce a novel combined filter feature selection approach for efficient and accurate inference of GRNs. A Boolean framework for network modelling is used to demonstrate the efficacy of the proposed approach. Using discretized microarray expression data, the genes most relevant to each target gene are first filtered using ReliefF, an instance-based feature ranking method that is here applied for the first time to GRN inference. Then, further gene selection from the filtered-gene list is done using a mutual information-based min-redundancy max-relevance criterion by eliminating irrelevant genes. This combined method is executed on resampled datasets to finalize the optimal set of regulatory genes. Building upon our previous research, a Pearson correlation coefficient-based Boolean modelling approach is utilized for the efficient identification of the optimal regulatory rules associated with selected regulatory genes. The proposed approach was evaluated using gene expression datasets from small-scale and medium-scale real gene networks, and was observed to be more effective than Linear Discriminant Analysis, performed better than the individual feature selection methods, and obtained improved Structural Accuracy with a higher number of true positives than other state-of-the-art methods, while outperforming these methods with respect to Dynamic Accuracy and efficiency. © 2022 Elsevier B.V.
A new feature selection technique for load and price forecast of electrical power systems
- Authors: Abedinia, Oveis , Amjady, Nima , Zareipour, Hamidreza
- Date: 2017
- Type: Text , Journal article
- Relation: IEEE transactions on power systems Vol. 32, no. 1 (2017), p. 62-74
- Full Text: false
- Reviewed:
- Description: Load and price forecasts are necessary for optimal operation planning in competitive electricity markets. However, most of the load and price forecast methods suffer from lack of an efficient feature selection technique with the ability of modeling the nonlinearities and interacting features of the forecast processes. In this paper, a new feature selection method is presented. An important contribution of the proposed method is modeling interaction in addition to relevancy and redundancy, based on information-theoretic criteria, for feature selection. Another main contribution of the paper is proposing a hybrid filter-wrapper approach. The filter part selects a minimum subset of the most informative features by considering relevancy, redundancy, and interaction of the candidate inputs in a coordinated manner. The wrapper part fine-tunes the settings of the composite filter.