Dynamical systems based on a fuzzy derivative and its applications to data classification
- Authors: Mammadov, Musa , Rubinov, Alex , Yearwood, John
- Date: 2003
- Type: Text , Conference paper
- Relation: Paper presented at the Industrial Optimisation 2003 Conference, Perth : 30th September, 2002
- Full Text: false
- Reviewed:
- Description: E1
- Description: 2003000339
A semantic method to information extraction for decision support systems
- Authors: Ofoghi, Bahadorreza , Yearwood, John , Ghosh, Ranadhir
- Date: 2006
- Type: Text , Conference proceedings
- Full Text: false
- Description: In this paper, we describe a novel schema for a more semantic text mining process which results in more comprehensive decision making activity by decision support systems via providing more effective and accurate textual information. The utility of two semantic lexical resources; Frame Net and Word Net, in extracting required text snippets from unstructured free texts yields a better and more accurate information extraction process to deliver more precise information either to a DSS or to a decision maker. We explain how the usage of these lexical resources could elevate a focused text mining process which could be applied to an information provider system in a decision support paradigm. The preliminary results obtained after a starter experiment show that the hybrid information extraction schema performs well on some semantic failure situations.
- Description: 2003010644
Classification for accuracy and insight : A weighted sum approach
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Sixth Australasian Data Mining Conference, AusDM 2007, Gold Coast, Queensland, Victoria : 3rd-4th December 2007 p. 203-208
- Full Text:
- Description: This research presents a classifier that aims to provide insight into a dataset in addition to achieving classification accuracies comparable to other algorithms. The classifier called, Automated Weighted Sum (AWSum) uses a weighted sum approach where feature values are assigned weights that are summed and compared to a threshold in order to classify an example. Though naive, this approach is scalable, achieves accurate classifications on standard datasets and also provides a degree of insight. By insight we mean that the technique provides an appreciation of the influence a feature value has on class values, relative to each other. AWSum provides a focus on the feature value space that allows the technique to identify feature values and combinations of feature values that are sensitive and important for a classification. This is particularly useful in fields such as medicine where this sort of micro-focus and understanding is critical in classification.
- Description: 2003005504
Visual tools for analysing evolution, emergence, and error in data streams
- Authors: Hart, Sol , Yearwood, John , Bagirov, Adil
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at 6th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2007, Melbourne, Victoria : 11th-13th July 2007 p. 987-992
- Full Text:
- Description: The relatively new field of stream mining has necessitated the development of robust drift-aware algorithms that provide accurate, real time, data handling capabilities. Tools are needed to assess and diagnose important trends and investigate drift evolution parameters. In this paper, we present two new and novel visualisation techniques, Pixie and Luna graphs, which incorporate salient group statistics coupled with intuitive visual representations of multidimensional groupings over time. Through the novel representations presented here, spatial interactions between temporal divisions can be diagnosed and overall distribution patterns identified. It provides a means of evaluating in non-constrained capacity, commonly constrained evolutionary problems.
- Description: 2003005432
AWSum - applying data mining in a health care scenario
- Authors: Quinn, Anthony , Jelinek, Herbert , Stranieri, Andrew , Yearwood, John
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at International Conference on Intelligent Sensors, Sensor Networks and Information Processing, ISSNIP 2008, Sydney, New South Wales : 15th-18th December 2008 p. 291-296
- Full Text:
- Description: This paper investigates the application of a new data mining algorithm called Automated Weighted Sum, (AWSum), to diabetes screening data to explore its use in providing researchers with new insight into the disease and secondarily to explore the potential the algorithm has for the generation of prognostic models for clinical use. There are many data mining classifiers that produce high levels of predictive accuracy but their application to health research and clinical applications is limited because they are complex, produce results that are difficult to interpret and are difficult to integrate with current knowledge and practises. This is because most focus on accuracy at the expense of informing the user as to the influences that lead to their classification results. By providing this information on influences a researcher can be pointed to new potentially interesting avenues for investigation. AWSum measures influence by calculating a weight for each feature value that represents its influence on a class value relative to other class values. The results produced, although on limited data, indicated the approach has potential uses for research and has some characteristics that may be useful in the future development of prognostic models.
- Description: 2003006660
AWSum -Combining classification with knowledge acquisition
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz , Jelinek, Herbert
- Date: 2008
- Type: Text , Journal article
- Relation: International Journal of Software and Informatics Vol. 2, no. 2 (2008), p. 199-214
- Full Text: false
- Reviewed:
- Description: Many classifiers achieve high levels of accuracy but have limited applicability in real world situations because they do not lead to a greater understanding or insight into the way features influence the classification. In areas such as health informatics a classifier that clearly identifies the influences on classification can be used to direct research and formulate interventions. This research investigates the practical aplications of Automated Weighted Sum, (AWSum), a classifier that provides accuracy comparable to other techniques whist providing insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. The merits of this approach in classification and insight are evaluated on a Cystic Fibrosis and diabetes datasets with positive results.
Experimental investigation of clasification algorithms for ITS dataset
- Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2008
- Type: Text , Conference paper
- Relation: PKAW-08, Pacific Rim Knowledge Acquisition Workshop 2008, as part of PRICAI 2008, Tenth Pacific Rim p. 262-272
- Full Text: false
- Reviewed:
- Description: This article is devoted to experimental investigation of classification algorithms for analysis of ITS dataset. We introduce and consider a novel k-committees alogorithm for classification and compare it with the discrete k- means and nearest neighbour algorithms. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel algorithms and adjust familiar ones. We present the results of experiments comparing the efficiency of three classification methods in their ability to achieve agreement with classes published in the biological literature before. It turns out that our algorithms are efficient and can be used to obtain biologically significant classifications. A simplified version of a synthetic dataset, where the k-committees classifier out performs k-means and Nearest Neighbour classifiers, is also presented.
- Description: E1
A classification algorithm that derives weighted sum scores for insight into disease
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Third Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2009), Wellington, New Zealand : Vol. 97, p. 13-17
- Full Text:
- Description: Data mining is often performed with datasets associated with diseases in order to increase insights that can ultimately lead to improved prevention or treatment. Classification algorithms can achieve high levels of predictive accuracy but have limited application for facilitating the insight that leads to deeper understanding of aspects of the disease. This is because the representation of knowledge that arises from classification algorithms is too opaque, too complex or too sparse to facilitate insight. Clustering, association and visualisation approaches enable greater scope for clinicians to be engaged in a way that leads to insight, however predictive accuracy is compromised or non-existent. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classification algorithm that provides accuracy comparable to other techniques whilst providing some insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. Clinicians are very familiar with weighted sum scoring scales so the internal representation is intuitive and easily understood. This paper presents results from the use of the AWSum approach with data from patients suffering from Cystic Fibrosis.
A formula for multiple classifiers in data mining based on Brandt semigroups
- Authors: Kelarev, Andrei , Yearwood, John , Mammadov, Musa
- Date: 2009
- Type: Text , Journal article
- Relation: Semigroup Forum Vol. 78, no. 2 (2009), p. 293-309
- Full Text:
- Reviewed:
- Description: A general approach to designing multiple classifiers represents them as a combination of several binary classifiers in order to enable correction of classification errors and increase reliability. This method is explained, for example, in Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques, 2005, Sect. 7.5). The aim of this paper is to investigate representations of this sort based on Brandt semigroups. We give a formula for the maximum number of errors of binary classifiers, which can be corrected by a multiple classifier of this type. Examples show that our formula does not carry over to larger classes of semigroups. © 2008 Springer Science+Business Media, LLC.
An algorithm for the optimization of multiple classifers in data mining based on graphs
- Authors: Kelarev, Andrei , Ryan, Joe , Yearwood, John
- Date: 2009
- Type: Text , Journal article
- Relation: The Journal of Combinatorial Mathematics and Combinatorial Computing Vol. 71, no. (2009), p. 65-85
- Full Text: false
- Reviewed:
- Description: This article develops an efficient combinatorial algorithm based on labeled directed graphs and motivated by applications in data mining for designing multiple classifiers. Our method originates from the standard approach described in [37]. It defines a representation of a multiclass classifier in terms of several binary classifiers. We are using labeled graphs to introduce additional structure on the classifier. Representations of this sort are known to have serious advantages. An important property of these representations is their ability to correct errors of individual binary classifiers and produce correct combined output. For every representation like this we develop a combinatorial algorithm with quadratic running time to compute the largest number of errors of individual binary classifiers which can be corrected by the combined multiple classifier. In addition, we consider the question of optimizing the classifiers of this type and find all optimal representations for these multiple classifiers.
- Description: 2003007563
Rees matrix constructions for clustering of data
- Authors: Kelarev, Andrei , Watters, Paul , Yearwood, John
- Date: 2009
- Type: Journal article
- Relation: Journal of the Australian Mathematical Society Vol. 87, no. 3 (2009), p. 377-393
- Relation: http://purl.org/au-research/grants/arc/DP0211866
- Full Text:
- Reviewed:
- Description: This paper continues the investigation of semigroup constructions motivated by applications in data mining. We give a complete description of the error-correcting capabilities of a large family of clusterers based on Rees matrix semigroups well known in semigroup theory. This result strengthens and complements previous formulas recently obtained in the literature. Examples show that our theorems do not generalize to other classes of semigroups.
From convex to nonconvex: A loss function analysis for binary classification
- Authors: Zhao, Lei , Mammadov, Musa , Yearwood, John
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 p. 1281-1288
- Full Text:
- Reviewed:
- Description: Problems of data classification can be studied in the framework of regularization theory as ill-posed problems. In this framework, loss functions play an important role in the application of regularization theory to classification. In this paper, we review some important convex loss functions, including hinge loss, square loss, modified square loss, exponential loss, logistic regression loss, as well as some non-convex loss functions, such as sigmoid loss, ø-loss, ramp loss, normalized sigmoid loss, and the loss function of 2 layer neural network. Based on the analysis of these loss functions, we propose a new differentiable non-convex loss function, called smoothed 0-1 loss function, which is a natural approximation of the 0-1 loss function. To compare the performance of different loss functions, we propose two binary classification algorithms for binary classification, one for convex loss functions, the other for non-convex loss functions. A set of experiments are launched on several binary data sets from the UCI repository. The results show that the proposed smoothed 0-1 loss function is robust, especially for those noisy data sets with many outliers. © 2010 IEEE.
Internet security applications of Grobner-Shirvov bases
- Authors: Kelarev, Andrei , Yearwood, John , Watters, Paul
- Date: 2010
- Type: Text , Journal article
- Relation: Asian-European Journal of Mathematics Vol. 3, no. 3 (2010), p. 435-442
- Relation: http://purl.org/au-research/grants/arc/DP0211866
- Full Text: false
- Reviewed:
A Grobner-Shirshov Algorithm for Applications in Internet Security
- Authors: Kelarev, Andrei , Yearwood, John , Watters, Paul , Wu, Xinwen , Ma, Liping , Abawajy, Jemal , Pan, L.
- Date: 2011
- Type: Text , Journal article
- Relation: Southeast Asian Bulletin of Mathematics Vol. 35, no. (2011), p. 807-820
- Full Text: false
- Reviewed:
- Description: The design of multiple classication and clustering systems for the detection of malware is an important problem in internet security. Grobner-Shirshov bases have been used recently by Dazeley et al. [15] to develop an algorithm for constructions with certain restrictions on the sandwich-matrices. We develop a new Grobner-Shirshov algorithm which applies to a larger variety of constructions based on combinatorial Rees matrix semigroups without any restrictions on the sandwich-matrices.
Optimization of classifiers for data mining based on combinatorial semigroups
- Authors: Kelarev, Andrei , Yearwood, John , Watters, Paul
- Date: 2011
- Type: Text , Journal article
- Relation: Semigroup Forum Vol. 82, no. 2 (2011), p. 1-10
- Full Text:
- Reviewed:
- Description: The aim of the present article is to obtain a theoretical result essential for applications of combinatorial semigroups for the design of multiple classification systems in data mining. We consider a novel construction of multiple classification systems, or classifiers, combining several binary classifiers. The construction is based on combinatorial Rees matrix semigroups without any restrictions on the sandwich-matrix. Our main theorem gives a complete description of all optimal classifiers in this novel construction. © 2011 Springer Science+Business Media, LLC.
Empirical investigation of consensus clustering for large ECG data sets
- Authors: Kelarev, Andrei , Stranieri, Andrew , Yearwood, John , Jelinek, Herbert
- Date: 2012
- Type: Text , Conference proceedings
- Full Text: false
- Description: This article investigates a novel machine learning approach applying consensus clustering in conjunction with classification for the data mining of very large and highly dimensional ECG data sets. To obtain robust and stable clusterings, consensus functions can be applied for clustering ensembles combining a multitude of independent initial clusterings. Direct applications of consensus functions to highly dimensional ECG data sets remain computationally expensive and impracticable. We introduce a multistage scheme including various procedures for dimensionality reduction, consensus clustering of randomized samples, followed by the use of a fast supervised classification algorithm. Applying the Hybrid Bipartite Graph Formulation combined with rank ordering and SMO we obtained an area under the receiver operating curve of 0.987. The performance of the classification algorithm at the final stage is crucial for the effectiveness of this technique. It can be regarded as an indication of the reliability, quality and stability of the combined consensus clustering. © 2012 IEEE.
Rule-based classifiers and meta classifiers for identification of cardiac autonomic neuropathy progression
- Authors: Jelinek, Herbert , Kelarev, Andrei , Stranieri, Andrew , Yearwood, John
- Date: 2012
- Type: Text , Journal article
- Relation: International Journal of Information Science and Computer Mathematics Vol. 5, no. 2 (2012), p. 49-53
- Full Text:
- Reviewed:
- Description: We investigate and compare several rule-based classifiers and meta classifiers in their ability to obtain multi-class classifications of cardiac autonomic neuropathy (CAN) and its progression. The best results obtained in our experiments are significantly better than the outcomes published previously in the literature for analogous CAN identification tasks or simpler binary classification tasks.
A data mining application of the incidence semirings
- Authors: Abawajy, Jemal , Kelarev, Andrei , Yearwood, John , Turville, Christopher
- Date: 2013
- Type: Text , Journal article
- Relation: Houston Journal of Mathematics Vol. 39, no. 4 (2013), p. 1083-1093
- Relation: http://purl.org/au-research/grants/arc/LP0990908
- Full Text: false
- Reviewed:
- Description: This paper is devoted to a combinatorial problem for incidence semirings, which can be viewed as sets of polynomials over graphs, where the edges are the unknowns and the coefficients are taken from a semiring. The construction of incidence rings is very well known and has many useful applications. The present article is devoted to a novel application of the more general incidence semirings. Recent research on data mining has motivated the investigation of the sets of centroids that have largest weights in semiring constructions. These sets are valuable for the design of centroid-based classification systems, or classifiers, as well as for the design of multiple classifiers combining several individual classifiers. Our article gives a complete description of all sets of centroids with the largest weight in incidence semirings.
A new loss function for robust classification
- Authors: Zhao, Lei , Mammadov, Musa , Yearwood, John
- Date: 2014
- Type: Text , Journal article
- Relation: Intelligent Data Analysis Vol. 18, no. 4 (2014), p. 697-715
- Full Text: false
- Reviewed:
- Description: Loss function plays an important role in data classification. Manyloss functions have been proposed and applied to differentclassification problems. This paper proposes a new so called thesmoothed 0-1 loss function, that could be considered as anapproximation of the classical 0-1 loss function. Due to thenon-convexity property of the proposed loss function, globaloptimization methods are required to solve the correspondingoptimization problems. Together with the proposed loss function, wecompare the performance of several existing loss functions in theclassification of noisy data sets. In this comparison, differentoptimization problems are considered in regards to the convexity andsmoothness of different loss functions. The experimental resultsshow that the proposed smoothed 0-1 loss function works better ondata sets with noisy labels, noisy features, and outliers. © 2014 - IOS Press and the authors. All rights reserved.