Dynamical systems based on a fuzzy derivative and its applications to data classification
- Authors: Mammadov, Musa , Rubinov, Alex , Yearwood, John
- Date: 2003
- Type: Text , Conference paper
- Relation: Paper presented at the Industrial Optimisation 2003 Conference, Perth : 30th September, 2002
- Full Text: false
- Reviewed:
- Description: E1
- Description: 2003000339
A semantic method to information extraction for decision support systems
- Authors: Ofoghi, Bahadorreza , Yearwood, John , Ghosh, Ranadhir
- Date: 2006
- Type: Text , Conference proceedings
- Full Text: false
- Description: In this paper, we describe a novel schema for a more semantic text mining process which results in more comprehensive decision making activity by decision support systems via providing more effective and accurate textual information. The utility of two semantic lexical resources; Frame Net and Word Net, in extracting required text snippets from unstructured free texts yields a better and more accurate information extraction process to deliver more precise information either to a DSS or to a decision maker. We explain how the usage of these lexical resources could elevate a focused text mining process which could be applied to an information provider system in a decision support paradigm. The preliminary results obtained after a starter experiment show that the hybrid information extraction schema performs well on some semantic failure situations.
- Description: 2003010644
Derivative free stochastic discrete gradient method with adaptive mutation
- Authors: Ghosh, Ranadhir , Ghosh, Moumita , Bagirov, Adil
- Date: 2006
- Type: Text , Journal article
- Relation: Advances in Data Mining Vol. 4065, no. (2006), p. 264-278
- Full Text: false
- Reviewed:
- Description: In data mining we come across many problems such as function optimization problem or parameter estimation problem for classifiers for which a good learning algorithm for searching is very much necessary. In this paper we propose a stochastic based derivative free algorithm for unconstrained optimization problem. Many derivative-based local search methods exist which usually stuck into local solution for non-convex optimization problems. On the other hand global search methods are very time consuming and works for only limited number of variables. In this paper we investigate a derivative free multi search gradient based method which overcomes the problems of local minima and produces global solution in less time. We have tested the proposed method on many benchmark dataset in literature and compared the results with other existing algorithms. The results are very promising.
- Description: C1
- Description: 2003001541
Optimization in data mining
- Authors: Karasozen, Bulent , Rubinov, Alex , Weber, Gerhard-Wilhelm
- Date: 2006
- Type: Text , Journal article
- Relation: European Journal of Operational Research Vol. 173, no. 3 (2006), p. 701-704
- Full Text: false
- Reviewed:
- Description: C1
AWSum -Combining classification with knowledge acquisition
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz , Jelinek, Herbert
- Date: 2008
- Type: Text , Journal article
- Relation: International Journal of Software and Informatics Vol. 2, no. 2 (2008), p. 199-214
- Full Text: false
- Reviewed:
- Description: Many classifiers achieve high levels of accuracy but have limited applicability in real world situations because they do not lead to a greater understanding or insight into the way features influence the classification. In areas such as health informatics a classifier that clearly identifies the influences on classification can be used to direct research and formulate interventions. This research investigates the practical aplications of Automated Weighted Sum, (AWSum), a classifier that provides accuracy comparable to other techniques whist providing insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. The merits of this approach in classification and insight are evaluated on a Cystic Fibrosis and diabetes datasets with positive results.
Experimental investigation of clasification algorithms for ITS dataset
- Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2008
- Type: Text , Conference paper
- Relation: PKAW-08, Pacific Rim Knowledge Acquisition Workshop 2008, as part of PRICAI 2008, Tenth Pacific Rim p. 262-272
- Full Text: false
- Reviewed:
- Description: This article is devoted to experimental investigation of classification algorithms for analysis of ITS dataset. We introduce and consider a novel k-committees alogorithm for classification and compare it with the discrete k- means and nearest neighbour algorithms. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel algorithms and adjust familiar ones. We present the results of experiments comparing the efficiency of three classification methods in their ability to achieve agreement with classes published in the biological literature before. It turns out that our algorithms are efficient and can be used to obtain biologically significant classifications. A simplified version of a synthetic dataset, where the k-committees classifier out performs k-means and Nearest Neighbour classifiers, is also presented.
- Description: E1
Isolation forest
- Authors: Liu, Fei , Ting, Kaiming , Zhou, Zhi-Hua
- Date: 2008
- Type: Text , Conference paper
- Relation: Proceedings of the Eighth IEEE International Conference on Data Mining p. 413-422
- Full Text: false
- Reviewed:
- Description: Most existing model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies instead of profiles normal points. To our best knowledge, the concept of isolation has not been explored in current literature. The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement. Our empirical evaluation shows that iForest performs favourably to ORCA, a near-linear time complexity distance-based method, LOF and random forests in terms of AUC and processing time, and especially in large data sets. iForest also works well in high dimensional problems which have a large number of irrelevant attributes, and in situations where training set does not contain any anomalies.
Issues of grid-cluster retrievals in swarm-based clustering
- Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
- Date: 2008
- Type: Text , Conference paper
- Relation: Proceedings of the 2008 IEEE World Congress on Computational Intelligence p. 511-518
- Full Text: false
- Reviewed:
- Description: One common approach in swarm-based clustering is to use agents to create a set of clusters on a two-dimensional grid, and then use an existing clustering method to retrieve the clusters on the grid. The second step, which we call grid-cluster retrieval, is an essential step to obtain an explicit partitioning of data. In this study, we highlight the issues in grid-cluster retrievals commonly neglected by researchers, and demonstrate the non-trivial difficulties involved. To tackle the issues, we then evaluate three methods: K-means, hierarchical clustering (Weighted Single-link) and density-based clustering (DBScan). Among the three methods, DBScan is the only method which has not been previously used for grid-cluster retrievals, yet it is shown to be the most suitable method in terms of effectiveness and efficiency.
An algorithm for the optimization of multiple classifers in data mining based on graphs
- Authors: Kelarev, Andrei , Ryan, Joe , Yearwood, John
- Date: 2009
- Type: Text , Journal article
- Relation: The Journal of Combinatorial Mathematics and Combinatorial Computing Vol. 71, no. (2009), p. 65-85
- Full Text: false
- Reviewed:
- Description: This article develops an efficient combinatorial algorithm based on labeled directed graphs and motivated by applications in data mining for designing multiple classifiers. Our method originates from the standard approach described in [37]. It defines a representation of a multiclass classifier in terms of several binary classifiers. We are using labeled graphs to introduce additional structure on the classifier. Representations of this sort are known to have serious advantages. An important property of these representations is their ability to correct errors of individual binary classifiers and produce correct combined output. For every representation like this we develop a combinatorial algorithm with quadratic running time to compute the largest number of errors of individual binary classifiers which can be corrected by the combined multiple classifier. In addition, we consider the question of optimizing the classifiers of this type and find all optimal representations for these multiple classifiers.
- Description: 2003007563
Application of optimisation-based data mining techniques to tobacco control dataset
- Authors: Dzalilov, Zari , Zhang, J , Bagirov, Adil , Mammadov, Musa
- Date: 2010
- Type: Text , Journal article
- Relation: International Journal of Lean Thinking Vol. 1, no. 1 (2010), p. 27-41
- Full Text: false
- Reviewed:
- Description: Tobacco smoking is one of the leading causes of death around the world. Consequently, control of tobacco use is an important global public health issue. Tobacco control may be aided by development of theoretical and methodological frameworks for describing and understanding complex tobacco control systems. Linear regression and logistic regression are currently very popular statistical techniques for modeling and analyzing complex data in tobacco control systems. However, in tobacco markets, numerous interrelated factors nontrivially interact with tobacco control policies, such that policies and control outcomes are nonlinearly related.
Internet security applications of Grobner-Shirvov bases
- Authors: Kelarev, Andrei , Yearwood, John , Watters, Paul
- Date: 2010
- Type: Text , Journal article
- Relation: Asian-European Journal of Mathematics Vol. 3, no. 3 (2010), p. 435-442
- Relation: http://purl.org/au-research/grants/arc/DP0211866
- Full Text: false
- Reviewed:
A Grobner-Shirshov Algorithm for Applications in Internet Security
- Authors: Kelarev, Andrei , Yearwood, John , Watters, Paul , Wu, Xinwen , Ma, Liping , Abawajy, Jemal , Pan, L.
- Date: 2011
- Type: Text , Journal article
- Relation: Southeast Asian Bulletin of Mathematics Vol. 35, no. (2011), p. 807-820
- Full Text: false
- Reviewed:
- Description: The design of multiple classication and clustering systems for the detection of malware is an important problem in internet security. Grobner-Shirshov bases have been used recently by Dazeley et al. [15] to develop an algorithm for constructions with certain restrictions on the sandwich-matrices. We develop a new Grobner-Shirshov algorithm which applies to a larger variety of constructions based on combinatorial Rees matrix semigroups without any restrictions on the sandwich-matrices.
A novel piecewise linear classifier based on polyhedral conic and max-min separabilities
- Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean , Ozturk, Gurkan , Kasimbeyli, Refail
- Date: 2011
- Type: Text , Journal article
- Relation: TOP Vol.21, no.1 (2011), p. 1-22
- Full Text: false
- Reviewed:
- Description: In this paper, an algorithm for finding piecewise linear boundaries between pattern classes is developed. This algorithm consists of two main stages. In the first stage, a polyhedral conic set is used to identify data points which lie inside their classes, and in the second stage we exclude those points to compute a piecewise linear boundary using the remaining data points. Piecewise linear boundaries are computed incrementally starting with one hyperplane. Such an approach allows one to significantly reduce the computational effort in many large data sets. Results of numerical experiments are reported. These results demonstrate that the new algorithm consistently produces a good test set accuracy on most data sets comparing with a number of other mainstream classifiers. © 2011 Sociedad de EstadÃstica e Investigación Operativa.
An efficient algorithm for the incremental construction of a piecewise linear classifier
- Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean
- Date: 2011
- Type: Text , Journal article
- Relation: Information Systems Vol. 36, no. 4 (2011), p. 782-790
- Relation: http://purl.org/au-research/grants/arc/DP0666061
- Full Text: false
- Reviewed:
- Description: In this paper the problem of finding piecewise linear boundaries between sets is considered and is applied for solving supervised data classification problems. An algorithm for the computation of piecewise linear boundaries, consisting of two main steps, is proposed. In the first step sets are approximated by hyperboxes to find so-called "indeterminate" regions between sets. In the second step sets are separated inside these "indeterminate" regions by piecewise linear functions. These functions are computed incrementally starting with a linear function. Results of numerical experiments are reported. These results demonstrate that the new algorithm requires a reasonable training time and it produces consistently good test set accuracy on most data sets comparing with mainstream classifiers. © 2010 Elsevier B.V. All rights reserved.
Application of SVM in citation information extraction
- Authors: Liang, Jiguang , Layton, Robert , Wang, Wei
- Date: 2011
- Type: Text , Conference proceedings
- Full Text: false
- Description: Support Vector Machines are an effective form of binary-class classification algorithm. To enhance the utilization of text structural features for information extraction, which are greatly restricted by the Hidden Markov Model (HMM), this paper proposes a support vector machine multi-class classification based on Markov properties to extract the information from a citation database. The proposed model extracts symbol characteristics as features and composes a binary tree of the transition probabilities. Experiments show that the proposed method outperforms HMM and basic SVM methods. © 2011 IEEE.
Classification through incremental max-min separability
- Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean , Karasozen, Bulent
- Date: 2011
- Type: Text , Journal article
- Relation: Pattern Analysis and Applications Vol. 14, no. 2 (2011), p. 165-174
- Relation: http://purl.org/au-research/grants/arc/DP0666061
- Full Text: false
- Reviewed:
- Description: Piecewise linear functions can be used to approximate non-linear decision boundaries between pattern classes. Piecewise linear boundaries are known to provide efficient real-time classifiers. However, they require a long training time. Finding piecewise linear boundaries between sets is a difficult optimization problem. Most approaches use heuristics to avoid solving this problem, which may lead to suboptimal piecewise linear boundaries. In this paper, we propose an algorithm for globally training hyperplanes using an incremental approach. Such an approach allows one to find a near global minimizer of the classification error function and to compute as few hyperplanes as needed for separating sets. We apply this algorithm for solving supervised data classification problems and report the results of numerical experiments on real-world data sets. These results demonstrate that the new algorithm requires a reasonable training time and its test set accuracy is consistently good on most data sets compared with mainstream classifiers. © 2010 Springer-Verlag London Limited.
Determining the influence of visual training on EEG activity patterns using association rule mining
- Authors: Yan, Fangang , Watters, Paul , Wang, Wei
- Date: 2011
- Type: Text , Conference proceedings
- Full Text: false
- Description: To confirm that visual training can change EEG patterns by association rule mining method, firstly, we collected the EEG of people who are under a long-term visual professional training (visual training group) and novice people (control group) during a specific mental tasks. Secondly, we determined the difference of brain electrical activity between the two groups using machine learning methods. Thirdly, we discovered distinct patterns using association rule algorithm, finding that the two groups were separable based on their completion of visual professional cognitive tasks. In the beta band, visual training group showed a specific and significant association pattern which included FP1 and C4. The results indicate that the EEG patterns were modified because of visual professional training. We further discuss the impact of long-term visual professional training on the EEG. © 2011 IEEE.
Fake file detection in P2P networks by consensus and reputation
- Authors: Watters, Paul , Layton, Robert
- Date: 2011
- Type: Text , Conference proceedings
- Full Text: false
- Description: Previous research [1] has indicated that reputation scores can be used as the basis for trust computation in P2P networks. In this paper, we use reputation scores calculated from P2P search engine rating sites to determine whether a torrent is likely to be linked to a fake file (or not). Our results indicate clear separability between files which are fake and which are genuine, assuming the integrity of the "community" ratings provided by specific subcultural groups [2]. Suggestions for more sophisticated reputation-based scoring are also provided. © 2011 Crown.
K-AP clustering algorithm for large scale dataset
- Authors: Liu, Chao , Hay, Rosemary , Wang, Wei
- Date: 2011
- Type: Text , Conference proceedings
- Full Text: false
- Description: Affinity propagation clustering algorithm is with a broad value in science and engineering because of it no need to input the number of clusters in advances, robustness and good generalization. But the algorithm needs the initial similarity (the distance between any two points) as a parameter, a lot of time and storage space is required for the calculation of similarity. It's limited to apply to cluster of the large amounts of data. To solve problem, this paper brings forward K-AP cluster algorithm which integrate k-means algorithm to AP algorithm to decrease time-consuming and space superiority. The results show the K-AP algorithm is faster than the original algorithm processing in speed, and it can cluster large amounts of data, and achieve better results. © 2011 IEEE.
Parameter optimization for Support Vector Machine Classifier with IO-GA
- Authors: Zhou, Jing , Maruatona, Omaru , Wang, Wei
- Date: 2011
- Type: Text , Conference proceedings
- Full Text: false
- Description: The Support Vector Machine method has a good learning and generalization ability. Unfortunately, there are no comprehensive theories to guide the parameter selection of the SVM, which largely limits its application. In order to get the optimal parameters automatically, researchers have tried a variety of methods. Using genetic algorithms to optimize parameters of an SVM Classifier has become one of the favorite methods in recent years. In this paper, we explain how the Standard Genetic Algorithm (SGA) causes the problem of premature convergence and limits the accuracy of the SVM. We also put forward a new genetic algorithm with improved genetic operators (IO-GA) to optimize the SVM classifier's parameters. Experimental results show that the parameters obtained by this method can greatly improve the classification performance of SVM. We therefore conclude that this method is effective. © 2011 IEEE.