Nonsmooth optimisation approach to data classification
- Authors: Bagirov, Adil , Soukhoroukova, Nadejda
- Date: 2001
- Type: Text , Conference paper
- Relation: Paper presented at Post-graduate ADFA Conference for Computer Science, PACCS01, Canberra, Australian Capital Territory : 14th July 2001
- Full Text:
- Description: We reduce the supervised classification to solving a nonsmooth optimization problem. The proposed method allows one to solve classification problems for databases with arbitrary number of classes. Numerical experiments have been carried out with databases of small and medium size. We present their results and provide comparison of these results with ones obtained by other algorithms of classification based on the optimization techniques. Results of numerical experiments show effectiveness of the proposed algorithms.
- Description: 2003003668
Multi label classification and drug-reaction associations using global optimization techniques
- Authors: Mammadov, Musa , Yearwood, John , Aliyea, Leyla
- Date: 2004
- Type: Text , Conference paper
- Relation: Paper presented at ICOTA6: 6th International Conference on Optimization - Techniques and Applications, Ballarat, Victoria : 9th December, 2004
- Full Text: false
- Reviewed:
- Description: E1
- Description: 2003000890
Using links to aid web classification
- Authors: Xie, Wei , Mammadov, Musa , Yearwood, John
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at 6th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2007, Melbourne, Victoria : 11th-13th July 2007 p. 981-986
- Full Text:
- Description: In this paper, we will present a new approach of using link information to improve the accuracy and efficiency of web classification. However, different from others, we only use the mappings between linked documents and their own class or classes. In this case, we only need to add a few features called linked-class features into the datasets. We apply SVM and BoosTexter for classification. We show that the classification accuracy can be improved based on mixtures of ordinary word features and out-linked-class features. We analyze and discuss the reason of this improvement.
- Description: 2003005438
Experimental investigation of clasification algorithms for ITS dataset
- Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2008
- Type: Text , Conference paper
- Relation: PKAW-08, Pacific Rim Knowledge Acquisition Workshop 2008, as part of PRICAI 2008, Tenth Pacific Rim p. 262-272
- Full Text: false
- Reviewed:
- Description: This article is devoted to experimental investigation of classification algorithms for analysis of ITS dataset. We introduce and consider a novel k-committees alogorithm for classification and compare it with the discrete k- means and nearest neighbour algorithms. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel algorithms and adjust familiar ones. We present the results of experiments comparing the efficiency of three classification methods in their ability to achieve agreement with classes published in the biological literature before. It turns out that our algorithms are efficient and can be used to obtain biologically significant classifications. A simplified version of a synthetic dataset, where the k-committees classifier out performs k-means and Nearest Neighbour classifiers, is also presented.
- Description: E1
Predicting trading signals of stock market indices using neural networks
- Authors: Tilakaratne, Chandima , Mammadov, Musa , Morris, Sidney
- Date: 2008
- Type: Text , Conference paper
- Relation: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Auckland 1 December 2008 through 5 December 2008 Vol. 5360 LNAI, p. 522-531
- Full Text: false
- Description: The aim of this paper is to develop new neural network algorithms to predict trading signals: buy, hold and sell, of stock market indices. Most commonly used classification techniques are not suitable to predict trading signals when the distribution of the actual trading signals, among theses three classes, is imbalanced. In this paper, new algorithms were developed based on the structure of feedforward neural networks and a modified Ordinary Least Squares (OLS) error function. An adjustment relating to the contribution from the historical data used for training the networks, and the penalization of incorrectly classified trading signals were accounted for when modifying the OLS function. A global optimization algorithm was employed to train these networks. The algorithms developed in this study were employed to predict the trading signals of day (t+1) of the Australian All Ordinary Index. The algorithms with the modified error functions introduced by this study produced better predictions. © 2008 Springer Berlin Heidelberg.
A classification algorithm that derives weighted sum scores for insight into disease
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Third Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2009), Wellington, New Zealand : Vol. 97, p. 13-17
- Full Text:
- Description: Data mining is often performed with datasets associated with diseases in order to increase insights that can ultimately lead to improved prevention or treatment. Classification algorithms can achieve high levels of predictive accuracy but have limited application for facilitating the insight that leads to deeper understanding of aspects of the disease. This is because the representation of knowledge that arises from classification algorithms is too opaque, too complex or too sparse to facilitate insight. Clustering, association and visualisation approaches enable greater scope for clinicians to be engaged in a way that leads to insight, however predictive accuracy is compromised or non-existent. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classification algorithm that provides accuracy comparable to other techniques whilst providing some insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. Clinicians are very familiar with weighted sum scoring scales so the internal representation is intuitive and easily understood. This paper presents results from the use of the AWSum approach with data from patients suffering from Cystic Fibrosis.
An incremental approach for the construction of a piecewise linear classifier
- Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at XIIIth International Conference : Applied Stochastic Models and Data Analysis, ASMDA 2009, Vilnius, Lithuania : 30th June - 3rd July 2009 p. 507–511
- Relation: https://purl.org/au-research/grants/arc/DP0666061
- Full Text: false
- Description: In this paper the problem of finding piecewise linear boundaries between sets is considered and is applied for solving supervised data classification problems. An algorithm for the computation of piecewise linear boundaries, consisting of two main steps, is proposed. In the first step sets are approximated by hyperboxes to find so-called “indeterminate” regions between sets. In the second step sets are separated inside these “indeterminate” regions by piecewise linear functions. These functions are computed incrementally starting with a linear function. Results of numerical experiments are reported. These results demonstrate that the new algorithm requires a reasonable training time and it produces consistently good test set accuracy on most data sets comparing with mainstream classifiers.
- Description: 2003007559
Experimental investigation of three machine learning algorithms for ITS dataset
- Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at First International Conference, FGIT 2009, Future Generation Information Technology, Jeju Island, Korea : 10th-12th December 2009 Vol. 5899, p. 308-316
- Full Text:
- Description: The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their ability to achieve agreement with classes published in the biologi cal literature before. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel machine learning ap proaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the discrete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are efficient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified version of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.
- Description: 2003007844
The case for a consistent cyberscam classification framework (CCCF)
- Authors: Stabek, Amber , Brown, Simon , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at UIC-ATC 2009 - Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing in Conjunction with the UIC'09 and ATC'09 Conferences, Brisbane : 7th-9th July 2009 p. 525-530
- Full Text:
- Description: Cyberscam classification schemes developed by international statistical reporting bodies, including the Bureau of Statistics (Australia), the Internet Crime Complaint Center (US), and the Environics Research Group (Canada), are diverse and largely incompatible. This makes comparisons of cyberscam incidence across jurisdictions very difficult. This paper argues that the critical first step towards the development of an inter-jurisdictional and global approach to identify and intercept cyberscams - and prosecute scammers - is a uniform classification system. © 2009 IEEE.
From convex to nonconvex: A loss function analysis for binary classification
- Authors: Zhao, Lei , Mammadov, Musa , Yearwood, John
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 p. 1281-1288
- Full Text:
- Reviewed:
- Description: Problems of data classification can be studied in the framework of regularization theory as ill-posed problems. In this framework, loss functions play an important role in the application of regularization theory to classification. In this paper, we review some important convex loss functions, including hinge loss, square loss, modified square loss, exponential loss, logistic regression loss, as well as some non-convex loss functions, such as sigmoid loss, ø-loss, ramp loss, normalized sigmoid loss, and the loss function of 2 layer neural network. Based on the analysis of these loss functions, we propose a new differentiable non-convex loss function, called smoothed 0-1 loss function, which is a natural approximation of the 0-1 loss function. To compare the performance of different loss functions, we propose two binary classification algorithms for binary classification, one for convex loss functions, the other for non-convex loss functions. A set of experiments are launched on several binary data sets from the UCI repository. The results show that the proposed smoothed 0-1 loss function is robust, especially for those noisy data sets with many outliers. © 2010 IEEE.
RBACS : Rootkit behavioral analysis and classification system
- Authors: Lobo, Desmond , Watters, Paul , Wu, Xinwen
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 3rd International Conference on Knowledge Discovery and Data Mining, WKDD 2010, Phuket : 9th-10th January 2010 p. 75-80
- Full Text:
- Description: In this paper, we focus on rootkits, a special type of malicious software (malware) that operates in an obfuscated and stealthy mode to evade detection. Categorizing these rootkits will help in detecting future attacks against the business community. We first developed a theoretical framework for classifying rootkits. Based on our theoretical framework, we then proposed a new rootkit classification system and tested our system on a sample of rootkits that use inline function hooking. Our experimental results showed that our system could successfully categorize the sample using unsupervised clustering. © 2010 IEEE.
An application of novel clustering technique for information security
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2011
- Type: Text , Conference paper
- Relation: Applications and Techniques in Information Security Workshop p. 5-11
- Full Text: false
- Reviewed:
- Description: This article presents experimental results devoted to a new application of the novel clustering technique introduced by the authors recently. Our aim is to facilitate the application of robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on the particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, we use a consensus function to combine these independent clusterings into one consensus clustering . Feature ranking is used to select a subset of features for the consensus function. Third, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of three consensus functions, Cluster-Based Graph Formulation (CBGF), Hybrid Bipartite Graph Formulation (HBGF), and Instance-Based Graph Formulation (IBGF) and a variety of supervised classification algorithms. The best precision and recall have been obtained by the combination of the HBGF consensus function and the SMO classifier with the polynomial kernel.
- Description: 2003009195
On low-rank regularized least squares for scalable nonlinear classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2011
- Type: Text , Conference paper
- Relation: International Conference on Neural Information Processing p. 490-499
- Full Text: false
- Reviewed:
- Description: In this paper, we revisited the classical technique of Regularized Least Squares (RLS) for the classification of large-scale nonlinear data. Specifically, we focus on a low-rank formulation of RLS and show that it has linear time complexity in the data size only and does not rely on the number of labels and features for problems with moderate feature dimension. This makes low-rank RLS particularly suitable for classification with large data sets. Moreover, we have proposed a general theorem for the closed-form solutions to the Leave-One-Out Cross Validation (LOOCV) estimation problem in empirical risk minimization which encompasses all types of RLS classifiers as special cases. This eliminates the reliance on cross validation, a computationally expensive process for parameter selection, and greatly accelerate the training process of RLS classifiers. Experimental results on real and synthetic large-scale benchmark data sets have shown that low-rank RLS achieves comparable classification performance while being much more efficient than standard kernel SVM for nonlinear classification. The improvement in efficiency is more evident for data sets with higher dimensions.
Learning sparse kernel classifiers in the primal
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2012
- Type: Text , Conference paper
- Relation: Joint IAPR International Workshop, SSPR&SPR 2012; Hiroshima, Japan; 7th-9th November 2012; published in Structural, Syntactic, and Statistical Pattern Recognition (part of the Lecture Notes in Computer Science) Vol. 7626, p. 60-69
- Full Text: false
- Reviewed:
- Description: The increasing number of classification applications in large data sets demands that efficient classifiers be designed not only in training but also for prediction. In this paper, we address the problem of learning kernel classifiers with reduced complexity and improved efficiency for prediction in comparison to those trained by standard methods. A single optimisation problem is formulated for classifier learning which optimises both classifier weights and eXpansion Vectors (XVs) that define the classification function in a joint fashion. Unlike the existing approach of Wu et al, which performs optimisation in the dual formulation, our approach solves the primal problem directly. The primal problem is much more efficient to solve, as it can be converted to the training of a linear classifier in each iteration, which scales linearly to the size of the data set and the number of expansions. This makes our primal approach highly desirable for large-scale applications, where the dual approach is inadequate and prohibitively slow due to the solution of cubic-time kernel SVM involved in each iteration. Experimental results have demonstrated the efficiency and effectiveness of the proposed primal approach for learning sparse kernel classifiers that clearly outperform the alternatives.
An efficient classification using support vector machines
- Authors: Ruan, Ning , Chen, Yi , Gao, David
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings of 2013 Science and Information Conference, SAI 2013 p. 585-589
- Full Text: false
- Reviewed:
- Description: Support vector machine (SVM) is a popular method for classification in data mining. The canonical duality theory provides a unified analytic solution to a wide range of discrete and continuous problems in global optimization. This paper presents a canonical duality approach for solving support vector machine problem. It is shown that by the canonical duality, these nonconvex and integer optimization problems are equivalent to a unified concave maximization problem over a convex set and hence can be solved efficiently by existing optimization techniques. © 2013 The Science and Information Organization.
Analysis of Classifiers for Prediction of Type II Diabetes Mellitus
- Authors: Barhate, Rahul , Kulkarni, Pradnya
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 4th International Conference on Computing, Communication Control and Automation, ICCUBEA 2018
- Full Text:
- Reviewed:
- Description: Diabetes mellitus is a chronic disease and a health challenge worldwide. According to the International Diabetes Federation, 451 million people across the globe have diabetes, with this number anticipated to rise up to 693 million people by 2045. It has been shown that 80% of the complications arising from type II diabetes can be prevented or delayed by early identification of the people who are at risk. Diabetes is difficult to diagnose in the early stages as its symptoms grow subtly and gradually. In a majority of the cases, the patients remain undiagnosed until they are admitted for a heart attack or begin to lose their sight. This paper analyzes the different classification algorithms based on a patient's health history to aid doctors identify the presence of as well as promote early diagnosis and treatment. The experiments were conducted on Pima Indian Diabetes data set. Various classifiers used include K Nearest Neighbors, Logistic Regression, Decision Trees, Random Forest, Gradient Boosting, Support Vector Machine and Neural Network. Results demonstrate that Random Forests performed well on the data set giving an accuracy of 79.7%. © 2018 IEEE.
- Description: E1
Multi-source cyber-attacks detection using machine learning
- Authors: Taheri, Sona , Gondal, Iqbal , Bagirov, Adil , Harkness, Greg , Brown, Simon , Chi, Chihung
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne, Australia; 13th-15th February 2019 Vol. 2019-February, p. 1167-1172
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has significantly increased the number of devices connected to the Internet ranging from sensors to multi-source data information. As the IoT continues to evolve with new technologies number of threats and attacks against IoT devices are on the increase. Analyzing and detecting these attacks originating from different sources needs machine learning models. These models provide proactive solutions for detecting attacks and their sources. In this paper, we propose to apply a supervised machine learning classification technique to identify cyber-attacks from each source. More precisely, we apply the incremental piecewise linear classifier that constructs boundary between sources/classes incrementally starting with one hyperplane and adding more hyperplanes at each iteration. The algorithm terminates when no further significant improvement of the separation of sources/classes is possible. The construction and usage of piecewise linear boundaries allows us to avoid any possible overfitting. We apply the incremental piecewise linear classifier on the multi-source real world cyber security data set to identify cyber-attacks and their sources.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
Enhancing linear time complexity time series classification with hybrid bag-of-patterns
- Authors: Liang, Shen , Zhang, Yanchun , Ma, Jiangang
- Date: 2020
- Type: Text , Conference paper
- Relation: 25th International Conference on Database Systems for Advanced Applications, DASFAA 2020 Vol. 12112 LNCS, p. 717-735
- Full Text: false
- Reviewed:
- Description: In time series classification, one of the most popular models is Bag-Of-Patterns (BOP). Most BOP methods run in super-linear time. A recent work proposed a linear time BOP model, yet it has limited accuracy. In this work, we present Hybrid Bag-Of-Patterns (HBOP), which can greatly enhance accuracy while maintaining linear complexity. Concretely, we first propose a novel time series discretization method called SLA, which can retain more information than the classic SAX. We use a hybrid of SLA and SAX to expressively and compactly represent subsequences, which is our most important design feature. Moreover, we develop an efficient time series transformation method that is key to achieving linear complexity. We also propose a novel X-means clustering subroutine to handle subclasses. Extensive experiments on over 100 datasets demonstrate the effectiveness and efficiency of our method. © 2020, Springer Nature Switzerland AG.