Languages recognized by two-sided automata of graphs
- Authors: Miller, Mirka , Kelarev, Andrei , Sokratova, Olga
- Date: 2005
- Type: Text , Journal article
- Relation: Proceedings of the Estonian Academy of Sciences, Physics Mathematic Vol. 51, no. 1 (2005), p. 46-54
- Full Text: false
- Reviewed:
- Description: We introduce two-sided automata defined by directed graphs and describe all languages recognized by these automata.
- Description: C1
- Description: 2003001399
A description of incidence rings of group automata
- Authors: Kelarev, Andrei , Passman, D.S.
- Date: 2008
- Type: Text , Book chapter
- Relation: Contemporary Mathematics : Noncommutative Rings, Group Rings, Diagram Algebras and Their Applications : International Conference December 18-22, 2006, University of Madras, Chennai, India Chapter p. 27-33
- Full Text:
- Description: Group automata occur in the Krohn-Rhodes Decomposition Theorem and have been extensively investigated in the literature. The incidence rings of group automata were introduced by the first author in analogy with group rings and incidence rings of graphs. The main theorem of the present paper gives a complete description of the structure of incidence rings of group automata in terms of matrix rings over group rings and their natural modules. As a consequence, when the ground ring is a field, we can use known group algebra results to determine when the incidence algebra is prime, semiprime, Artinian or semisimple. We also offer sufficient conditions for the algebra to be semiprimitive.
- Description: 2003006588
An algorithm for BCH codes extended with finite state automata
- Authors: Kelarev, Andrei
- Date: 2008
- Type: Text , Journal article
- Relation: Fundamenta Informaticae Vol. 84, no. 1 (2008), p. 51-60
- Full Text: false
- Reviewed:
- Description: This articles develops a combinatorial algorithm for a class of codes extending BCH codes and constructed with finite state automata. Ou algorithm computes the largest number of errors that the extended codes can correct, and finds a generator for each optimal code in this class of extensions. The question of finding codes with largest possible information rates remains open.
- Description: C1
Experimental investigation of clasification algorithms for ITS dataset
- Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2008
- Type: Text , Conference paper
- Relation: PKAW-08, Pacific Rim Knowledge Acquisition Workshop 2008, as part of PRICAI 2008, Tenth Pacific Rim p. 262-272
- Full Text: false
- Reviewed:
- Description: This article is devoted to experimental investigation of classification algorithms for analysis of ITS dataset. We introduce and consider a novel k-committees alogorithm for classification and compare it with the discrete k- means and nearest neighbour algorithms. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel algorithms and adjust familiar ones. We present the results of experiments comparing the efficiency of three classification methods in their ability to achieve agreement with classes published in the biological literature before. It turns out that our algorithms are efficient and can be used to obtain biologically significant classifications. A simplified version of a synthetic dataset, where the k-committees classifier out performs k-means and Nearest Neighbour classifiers, is also presented.
- Description: E1
A formula for multiple classifiers in data mining based on Brandt semigroups
- Authors: Kelarev, Andrei , Yearwood, John , Mammadov, Musa
- Date: 2009
- Type: Text , Journal article
- Relation: Semigroup Forum Vol. 78, no. 2 (2009), p. 293-309
- Full Text:
- Reviewed:
- Description: A general approach to designing multiple classifiers represents them as a combination of several binary classifiers in order to enable correction of classification errors and increase reliability. This method is explained, for example, in Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques, 2005, Sect. 7.5). The aim of this paper is to investigate representations of this sort based on Brandt semigroups. We give a formula for the maximum number of errors of binary classifiers, which can be corrected by a multiple classifier of this type. Examples show that our formula does not carry over to larger classes of semigroups. © 2008 Springer Science+Business Media, LLC.
A polynomial ring construction for the classification of data
- Authors: Kelarev, Andrei , Yearwood, John , Vamplew, Peter
- Date: 2009
- Type: Text , Journal article
- Relation: Bulletin of the Australian Mathematical Society Vol. 79, no. 2 (2009), p. 213-225
- Full Text:
- Reviewed:
- Description: Drensky and Lakatos (Lecture Notes in Computer Science, 357 (Springer, Berlin, 1989), pp. 181-188) have established a convenient property of certain ideals in polynomial quotient rings, which can now be used to determine error-correcting capabilities of combined multiple classifiers following a standard approach explained in the well-known monograph by Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques (Elsevier, Amsterdam, 2005)). We strengthen and generalise the result of Drensky and Lakatos by demonstrating that the corresponding nice property remains valid in a much larger variety of constructions and applies to more general types of ideals. Examples show that our theorems do not extend to larger classes of ring constructions and cannot be simplified or generalised.
An algorithm for the optimization of multiple classifers in data mining based on graphs
- Authors: Kelarev, Andrei , Ryan, Joe , Yearwood, John
- Date: 2009
- Type: Text , Journal article
- Relation: The Journal of Combinatorial Mathematics and Combinatorial Computing Vol. 71, no. (2009), p. 65-85
- Full Text: false
- Reviewed:
- Description: This article develops an efficient combinatorial algorithm based on labeled directed graphs and motivated by applications in data mining for designing multiple classifiers. Our method originates from the standard approach described in [37]. It defines a representation of a multiclass classifier in terms of several binary classifiers. We are using labeled graphs to introduce additional structure on the classifier. Representations of this sort are known to have serious advantages. An important property of these representations is their ability to correct errors of individual binary classifiers and produce correct combined output. For every representation like this we develop a combinatorial algorithm with quadratic running time to compute the largest number of errors of individual binary classifiers which can be corrected by the combined multiple classifier. In addition, we consider the question of optimizing the classifiers of this type and find all optimal representations for these multiple classifiers.
- Description: 2003007563
Applying clustering and ensemble clustering approaches to phishing profiling
- Authors: Webb, Dean , Yearwood, John , Vamplew, Peter , Ma, Liping , Ofoghi, Bahadorreza , Kelarev, Andrei
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Eighth Australasian Data Mining Conference, AusDM 2009, University of Melbourne, Melbourne, Victoria : 1st–4th December 2009
- Full Text:
- Description: 2003007911
Cayley graphs as classifiers for data mining : The influence of asymmetries
- Authors: Kelarev, Andrei , Ryan, Joe , Yearwood, John
- Date: 2009
- Type: Text , Journal article
- Relation: Discrete Mathematics Vol. 309, no. 17 (2009), p. 5360-5369
- Relation: http://purl.org/au-research/grants/arc/DP0211866
- Full Text:
- Reviewed:
- Description: The endomorphism monoids of graphs have been actively investigated. They are convenient tools expressing asymmetries of the graphs. One of the most important classes of graphs considered in this framework is that of Cayley graphs. Our paper proposes a new method of using Cayley graphs for classification of data. We give a survey of recent results devoted to the Cayley graphs also involving their endomorphism monoids. © 2008 Elsevier B.V. All rights reserved.
Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks
- Authors: Vamplew, Peter , Dazeley, Richard , Barker, Ewan , Kelarev, Andrei
- Date: 2009
- Type: Text , Book chapter
- Relation: AI 2009 : Advances in Artificial Intelligence : 22nd Australasian Joint Conference, Melbourne, Australia, December 1-4, 2009. Proceedings Chapter p. 340-349
- Full Text:
- Description: Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problems with multiple conflicting objectives. This paper discusses the advantages gained from applying stochastic policies to multiobjective tasks and examines a particular form of stochastic policy known as a mixture policy. Two methods are proposed for deriving mixture policies for episodic multiobjective tasks from deterministic base policies found via scalarised reinforcement learning. It is shown that these approaches are an efficient means of identifying solutions which offer a superior match to the user’s preferences than can be achieved by methods based strictly on deterministic policies.
- Description: 2003007906
Experimental investigation of three machine learning algorithms for ITS dataset
- Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at First International Conference, FGIT 2009, Future Generation Information Technology, Jeju Island, Korea : 10th-12th December 2009 Vol. 5899, p. 308-316
- Full Text:
- Description: The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their ability to achieve agreement with classes published in the biologi cal literature before. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel machine learning ap proaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the discrete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are efficient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified version of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.
- Description: 2003007844
Optimization methods and the k-committees algorithm for clustering of sequence data
- Authors: Yearwood, John , Bagirov, Adil , Kelarev, Andrei
- Date: 2009
- Type: Text , Journal article
- Relation: Applied and Computational Mathematics Vol. 8, no. 1 (2009), p. 92-101
- Relation: http://purl.org/au-research/grants/arc/DP0211866
- Relation: http://purl.org/au-research/grants/arc/DP0666061
- Full Text: false
- Description: The present paper is devoted to new algorithms for unsupervised clustering based on the optimization approaches due to [2], [3] and [4]. We consider a novel situation, where the datasets consist of nucleotide or protein sequences and rather sophisticated biologically significant alignment scores have to be used as a measure of distance. Sequences of this kind cannot be regarded as points in a finite dimensional space. Besides, the alignment scores do not satisfy properties of Minkowski metrics. Nevertheless the optimization approaches have made it possible to introduce a new k-committees algorithm and compare its performance with previous algorithms for two datasets. Our experimental results show that the k-committees algorithms achieves intermediate accuracy for a dataset of ITS sequences, and it can perform better than the discrete k-means and Nearest Neighbour algorithms for certain datasets. All three algorithms achieve good agreement with clusters published in the biological literature before and can be used to obtain biologically significant clusterings.
Optimization of multiple classifiers in data mining based on string rewriting systems
- Authors: Dazeley, Richard , Kelarev, Andrei , Yearwood, John , Mammadov, Musa
- Date: 2009
- Type: Text , Journal article
- Relation: Asian-European Journal of Mathematics Vol. 2, no. 1 (2009), p. 41-56
- Relation: https://purl.org/au-research/grants/arc/DP0211866
- Relation: https://purl.org/au-research/grants/arc/LP0669752
- Full Text:
- Description: Optimization of multiple classifiers is an important problem in data mining. We introduce additional structure on the class sets of the classifiers using string rewriting systems with a convenient matrix representation. The aim of the present paper is to develop an efficient algorithm for the optimization of the number of errors of individual classifiers, which can be corrected by these multiple classifiers.
Rees matrix constructions for clustering of data
- Authors: Kelarev, Andrei , Watters, Paul , Yearwood, John
- Date: 2009
- Type: Journal article
- Relation: Journal of the Australian Mathematical Society Vol. 87, no. 3 (2009), p. 377-393
- Relation: http://purl.org/au-research/grants/arc/DP0211866
- Full Text:
- Reviewed:
- Description: This paper continues the investigation of semigroup constructions motivated by applications in data mining. We give a complete description of the error-correcting capabilities of a large family of clusterers based on Rees matrix semigroups well known in semigroup theory. This result strengthens and complements previous formulas recently obtained in the literature. Examples show that our theorems do not generalize to other classes of semigroups.
An application of consensus clustering for DDoS attacks detection
- Authors: Zi, Lifang , Yearwood, John , Kelarev, Andrei
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: The detection of Distributed Denial of Service (DDos) attacks is very important for maintaining the security of networks and the Internet. This paper introduces a novel iterative consensus process based on Hybrid Bipartite Graph Formulation (HGBF) consensus function for DDos attacks detection. First, the features are extracted during feature extraction process based on the analysis of network traffic. Second, several clustering algorithms are applied in combination with the silhouette index to obtain a collection of independent initial clusterings. Third, the HGBF consensus function and silhouette index are used to find an appropriate consensus clustering of the initial clusterings. Fourth, this new consensus clustering is added to the pool of initial clusterings replacing another clustering with the worst Silhouette index. Fifth, the process continues iteratively until the Silhouette index of the resulting consensus clusterings stabilizes. This iterative consensus clustering process can improve the quality of the clusters. The experimental results demonstrate that our iterative consensus process is effective and can be used in practice for detecting the separate phased of DDos attacks.
Consensus clustering and supervised classification for profiling phishing emails in internet commerce security
- Authors: Dazeley, Richard , Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 11th International Workshop on Knowledge Management and Acquisition for Smart Systems and Services, PKAW 2010 Vol. 6232 LNAI, p. 235-246
- Full Text:
- Reviewed:
- Description: This article investigates internet commerce security applications of a novel combined method, which uses unsupervised consensus clustering algorithms in combination with supervised classification methods. First, a variety of independent clustering algorithms are applied to a randomized sample of data. Second, several consensus functions and sophisticated algorithms are used to combine these independent clusterings into one final consensus clustering. Third, the consensus clustering of the randomized sample is used as a training set to train several fast supervised classification algorithms. Finally, these fast classification algorithms are used to classify the whole large data set. One of the advantages of this approach is in its ability to facilitate the inclusion of contributions from domain experts in order to adjust the training set created by consensus clustering. We apply this approach to profiling phishing emails selected from a very large data set supplied by the industry partners of the Centre for Informatics and Applied Optimization. Our experiments compare the performance of several classification algorithms incorporated in this scheme. © 2010 Springer-Verlag Berlin Heidelberg.
Internet security applications of Grobner-Shirvov bases
- Authors: Kelarev, Andrei , Yearwood, John , Watters, Paul
- Date: 2010
- Type: Text , Journal article
- Relation: Asian-European Journal of Mathematics Vol. 3, no. 3 (2010), p. 435-442
- Relation: http://purl.org/au-research/grants/arc/DP0211866
- Full Text: false
- Reviewed:
Internet security applications of the Munn rings
- Authors: Kelarev, Andrei , Yearwood, John , Watters, Paul , Wu, Xinwen , Abawajy, Jemal , Pan, L.
- Date: 2010
- Type: Text , Journal article
- Relation: Semigroup Forum Vol. 81, no. 1 (2010), p. 162-171
- Full Text:
- Reviewed:
- Description: Effective multiple clustering systems, or clusterers, have important applications in information security. The aim of the present article is to introduce a new method of designing multiple clusterers based on the Munn rings and describe a class of optimal clusterers which can be obtained in this construction.
A Grobner-Shirshov Algorithm for Applications in Internet Security
- Authors: Kelarev, Andrei , Yearwood, John , Watters, Paul , Wu, Xinwen , Ma, Liping , Abawajy, Jemal , Pan, L.
- Date: 2011
- Type: Text , Journal article
- Relation: Southeast Asian Bulletin of Mathematics Vol. 35, no. (2011), p. 807-820
- Full Text: false
- Reviewed:
- Description: The design of multiple classication and clustering systems for the detection of malware is an important problem in internet security. Grobner-Shirshov bases have been used recently by Dazeley et al. [15] to develop an algorithm for constructions with certain restrictions on the sandwich-matrices. We develop a new Grobner-Shirshov algorithm which applies to a larger variety of constructions based on combinatorial Rees matrix semigroups without any restrictions on the sandwich-matrices.
An application of novel clustering technique for information security
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2011
- Type: Text , Conference paper
- Relation: Applications and Techniques in Information Security Workshop p. 5-11
- Full Text: false
- Reviewed:
- Description: This article presents experimental results devoted to a new application of the novel clustering technique introduced by the authors recently. Our aim is to facilitate the application of robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on the particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, we use a consensus function to combine these independent clusterings into one consensus clustering . Feature ranking is used to select a subset of features for the consensus function. Third, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of three consensus functions, Cluster-Based Graph Formulation (CBGF), Hybrid Bipartite Graph Formulation (HBGF), and Instance-Based Graph Formulation (IBGF) and a variety of supervised classification algorithms. The best precision and recall have been obtained by the combination of the HBGF consensus function and the SMO classifier with the polynomial kernel.
- Description: 2003009195