List of Titles

A data mining application of the incidence semirings

Authors: Abawajy, Jemal , Kelarev, Andrei , Yearwood, John , Turville, Christopher
Date: 2013
Type: Text , Journal article
Relation: Houston Journal of Mathematics Vol. 39, no. 4 (2013), p. 1083-1093
Relation: http://purl.org/au-research/grants/arc/LP0990908
Full Text: false
Reviewed:
Description: This paper is devoted to a combinatorial problem for incidence semirings, which can be viewed as sets of polynomials over graphs, where the edges are the unknowns and the coefficients are taken from a semiring. The construction of incidence rings is very well known and has many useful applications. The present article is devoted to a novel application of the more general incidence semirings. Recent research on data mining has motivated the investigation of the sets of centroids that have largest weights in semiring constructions. These sets are valuable for the design of centroid-based classification systems, or classifiers, as well as for the design of multiple classifiers combining several individual classifiers. Our article gives a complete description of all sets of centroids with the largest weight in incidence semirings.

Application of rank correlation, clustering and classification in information security

Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
Date: 2012
Type: Text , Journal article
Relation: Journal of Networks Vol. 7, no. 6 (2012), p. 935-945
Full Text:
Reviewed:
Description: This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman-Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms. © 2012 Academy Publisher.
Description: 2003010277

Rule-based classifiers and meta classifiers for identification of cardiac autonomic neuropathy progression

Authors: Jelinek, Herbert , Kelarev, Andrei , Stranieri, Andrew , Yearwood, John
Date: 2012
Type: Text , Journal article
Relation: International Journal of Information Science and Computer Mathematics Vol. 5, no. 2 (2012), p. 49-53
Full Text:
Reviewed:
Description: We investigate and compare several rule-based classifiers and meta classifiers in their ability to obtain multi-class classifications of cardiac autonomic neuropathy (CAN) and its progression. The best results obtained in our experiments are significantly better than the outcomes published previously in the literature for analogous CAN identification tasks or simpler binary classification tasks.

An application of novel clustering technique for information security

Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
Date: 2011
Type: Text , Conference paper
Relation: Applications and Techniques in Information Security Workshop p. 5-11
Full Text: false
Reviewed:
Description: This article presents experimental results devoted to a new application of the novel clustering technique introduced by the authors recently. Our aim is to facilitate the application of robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on the particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, we use a consensus function to combine these independent clusterings into one consensus clustering . Feature ranking is used to select a subset of features for the consensus function. Third, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of three consensus functions, Cluster-Based Graph Formulation (CBGF), Hybrid Bipartite Graph Formulation (HBGF), and Instance-Based Graph Formulation (IBGF) and a variety of supervised classification algorithms. The best precision and recall have been obtained by the combination of the HBGF consensus function and the SMO classifier with the polynomial kernel.
Description: 2003009195

A formula for multiple classifiers in data mining based on Brandt semigroups

Authors: Kelarev, Andrei , Yearwood, John , Mammadov, Musa
Date: 2009
Type: Text , Journal article
Relation: Semigroup Forum Vol. 78, no. 2 (2009), p. 293-309
Full Text:
Reviewed:
Description: A general approach to designing multiple classifiers represents them as a combination of several binary classifiers in order to enable correction of classification errors and increase reliability. This method is explained, for example, in Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques, 2005, Sect. 7.5). The aim of this paper is to investigate representations of this sort based on Brandt semigroups. We give a formula for the maximum number of errors of binary classifiers, which can be corrected by a multiple classifier of this type. Examples show that our formula does not carry over to larger classes of semigroups. © 2008 Springer Science+Business Media, LLC.

A polynomial ring construction for the classification of data

Authors: Kelarev, Andrei , Yearwood, John , Vamplew, Peter
Date: 2009
Type: Text , Journal article
Relation: Bulletin of the Australian Mathematical Society Vol. 79, no. 2 (2009), p. 213-225
Full Text:
Reviewed:
Description: Drensky and Lakatos (Lecture Notes in Computer Science, 357 (Springer, Berlin, 1989), pp. 181-188) have established a convenient property of certain ideals in polynomial quotient rings, which can now be used to determine error-correcting capabilities of combined multiple classifiers following a standard approach explained in the well-known monograph by Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques (Elsevier, Amsterdam, 2005)). We strengthen and generalise the result of Drensky and Lakatos by demonstrating that the corresponding nice property remains valid in a much larger variety of constructions and applies to more general types of ideals. Examples show that our theorems do not extend to larger classes of ring constructions and cannot be simplified or generalised.

Cayley graphs as classifiers for data mining : The influence of asymmetries

Authors: Kelarev, Andrei , Ryan, Joe , Yearwood, John
Date: 2009
Type: Text , Journal article
Relation: Discrete Mathematics Vol. 309, no. 17 (2009), p. 5360-5369
Relation: http://purl.org/au-research/grants/arc/DP0211866
Full Text:
Reviewed:
Description: The endomorphism monoids of graphs have been actively investigated. They are convenient tools expressing asymmetries of the graphs. One of the most important classes of graphs considered in this framework is that of Cayley graphs. Our paper proposes a new method of using Cayley graphs for classification of data. We give a survey of recent results devoted to the Cayley graphs also involving their endomorphism monoids. Â© 2008 Elsevier B.V. All rights reserved.

Experimental investigation of three machine learning algorithms for ITS dataset

Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
Date: 2009
Type: Text , Conference paper
Relation: Paper presented at First International Conference, FGIT 2009, Future Generation Information Technology, Jeju Island, Korea : 10th-12th December 2009 Vol. 5899, p. 308-316
Full Text:
Description: The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their ability to achieve agreement with classes published in the biologi cal literature before. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel machine learning ap proaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the discrete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are efficient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified version of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.
Description: 2003007844

Experimental investigation of clasification algorithms for ITS dataset

Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
Date: 2008
Type: Text , Conference paper
Relation: PKAW-08, Pacific Rim Knowledge Acquisition Workshop 2008, as part of PRICAI 2008, Tenth Pacific Rim p. 262-272
Full Text: false
Reviewed:
Description: This article is devoted to experimental investigation of classification algorithms for analysis of ITS dataset. We introduce and consider a novel k-committees alogorithm for classification and compare it with the discrete k- means and nearest neighbour algorithms. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel algorithms and adjust familiar ones. We present the results of experiments comparing the efficiency of three classification methods in their ability to achieve agreement with classes published in the biological literature before. It turns out that our algorithms are efficient and can be used to obtain biologically significant classifications. A simplified version of a synthetic dataset, where the k-committees classifier out performs k-means and Nearest Neighbour classifiers, is also presented.
Description: E1

Showing items 1 - 9 of 9

A data mining application of the incidence semirings

Application of rank correlation, clustering and classification in information security

Rule-based classifiers and meta classifiers for identification of cardiac autonomic neuropathy progression

An application of novel clustering technique for information security

A formula for multiple classifiers in data mining based on Brandt semigroups

A polynomial ring construction for the classification of data

Cayley graphs as classifiers for data mining : The influence of asymmetries

Experimental investigation of three machine learning algorithms for ITS dataset

Experimental investigation of clasification algorithms for ITS dataset