Internet security applications of the Munn rings
- Authors: Kelarev, Andrei , Yearwood, John , Watters, Paul , Wu, Xinwen , Abawajy, Jemal , Pan, L.
- Date: 2010
- Type: Text , Journal article
- Relation: Semigroup Forum Vol. 81, no. 1 (2010), p. 162-171
- Full Text:
- Reviewed:
- Description: Effective multiple clustering systems, or clusterers, have important applications in information security. The aim of the present article is to introduce a new method of designing multiple clusterers based on the Munn rings and describe a class of optimal clusterers which can be obtained in this construction.
Classification systems based on combinatorial semigroups
- Authors: Abawajy, Jemal , Kelarev, Andrei
- Date: 2013
- Type: Text , Journal article
- Relation: Semigroup forum Vol. 86, no. 3 (2013), p. 603-612
- Full Text: false
- Reviewed:
- Description: The present article continues the investigation of constructions essential for applications of combinatorial semigroups to the design of multiple classification systems in data mining. Our main theorem gives a complete description of all optimal classification systems defined by one-sided ideals in a construction based on combinatorial Rees matrix semigroups. It strengthens and generalizes previous results, which handled the more narrow case of two-sided ideals.
Predicting cardiac autonomic neuropathy category for diabetic data with missing values
- Authors: Abawajy, Jemal , Kelarev, Andrei , Chowdhury, Morshed , Stranieri, Andrew , Jelinek, Herbert
- Date: 2013
- Type: Text , Journal article
- Relation: Computers in Biology and Medicine Vol. 43, no. 10 (2013), p. 1328-1333
- Full Text:
- Reviewed:
- Description: Cardiovascular autonomic neuropathy (CAN) is a serious and well known complication of diabetes. Previous articles circumvented the problem of missing values in CAN data by deleting all records and fields with missing values and applying classifiers trained on different sets of features that were complete. Most of them also added alternative features to compensate for the deleted ones. Here we introduce and investigate a new method for classifying CAN data with missing values. In contrast to all previous papers, our new method does not delete attributes with missing values, does not use classifiers, and does not add features. Instead it is based on regression and meta-regression combined with the Ewing formula for identifying the classes of CAN. This is the first article using the Ewing formula and regression to classify CAN. We carried out extensive experiments to determine the best combination of regression and meta-regression techniques for classifying CAN data with missing values. The best outcomes have been obtained by the additive regression meta-learner based on M5Rules and combined with the Ewing formula. It has achieved the best accuracy of 99.78% for two classes of CAN, and 98.98% for three classes of CAN. These outcomes are substantially better than previous results obtained in the literature by deleting all missing attributes and applying traditional classifiers to different sets of features without regression. Another advantage of our method is that it does not require practitioners to perform more tests collecting additional alternative features. © 2013 Elsevier Ltd.
- Description: C1
A multi-tier ensemble construction of classifiers for phishing email detection and filtering
- Authors: Abawajy, Jemal , Kelarev, Andrei
- Date: 2012
- Type: Text , Conference paper
- Relation: 4th International Symposium on Cyberspace Safety and Security, CSS 2012 Vol. 7672 LNCS, p. 48-56
- Full Text: false
- Reviewed:
- Description: This paper is devoted to multi-tier ensemble classifiers for the detection and filtering of phishing emails. We introduce a new construction of ensemble classifiers, based on the well known and productive multi-tier approach. Our experiments evaluate their performance for the detection and filtering of phishing emails. The multi-tier constructions are well known and have been used to design effective classifiers for email classification and other applications previously. We investigate new multi-tier ensemble classifiers, where diverse ensemble methods are combined in a unified system by incorporating different ensembles at a lower tier as an integral part of another ensemble at the top tier. Our novel contribution is to investigate the possibility and effectiveness of combining diverse ensemble methods into one large multi-tier ensemble for the example of detection and filtering of phishing emails. Our study handled a few essential ensemble methods and more recent approaches incorporated into a combined multi-tier ensemble classifier. The results show that new large multi-tier ensemble classifiers achieved better performance compared with the outcomes of the base classifiers and ensemble classifiers incorporated in the multi-tier system. This demonstrates that the new method of combining diverse ensembles into one unified multi-tier ensemble can be applied to increase the performance of classifiers if diverse ensembles are incorporated in the system. © 2012 Springer-Verlag.
- Description: 2003010675
Performance evaluation of multi-tier ensemble classifiers for phishing websites
- Authors: Abawajy, Jemal , Beliakov, Gleb , Kelarev, Andrei , Yearwood, John
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: This article is devoted to large multi-tier ensemble classifiers generated as ensembles of ensembles and applied to phishing websites. Our new ensemble construction is a special case of the general and productive multi-tier approach well known in information security. Many efficient multi-tier classifiers have been considered in the literature. Our new contribution is in generating new large systems as ensembles of ensembles by linking a top-tier ensemble to another middletier ensemble instead of a base classifier so that the toptier ensemble can generate the whole system. This automatic generation capability includes many large ensemble classifiers in two tiers simultaneously and automatically combines them into one hierarchical unified system so that one ensemble is an integral part of another one. This new construction makes it easy to set up and run such large systems. The present article concentrates on the investigation of performance of these new multi-tier ensembles for the example of detection of phishing websites. We carried out systematic experiments evaluating several essential ensemble techniques as well as more recent approaches and studying their performance as parts of multi-level ensembles with three tiers. The results presented here demonstrate that new three-tier ensemble classifiers performed better than the base classifiers and standard ensembles included in the system. This example of application to the classification of phishing websites shows that the new method of combining diverse ensemble techniques into a unified hierarchical three-tier ensemble can be applied to increase the performance of classifiers in situations where data can be processed on a large computer.
A Grobner-Shirshov Algorithm for Applications in Internet Security
- Authors: Kelarev, Andrei , Yearwood, John , Watters, Paul , Wu, Xinwen , Ma, Liping , Abawajy, Jemal , Pan, L.
- Date: 2011
- Type: Text , Journal article
- Relation: Southeast Asian Bulletin of Mathematics Vol. 35, no. (2011), p. 807-820
- Full Text: false
- Reviewed:
- Description: The design of multiple classication and clustering systems for the detection of malware is an important problem in internet security. Grobner-Shirshov bases have been used recently by Dazeley et al. [15] to develop an algorithm for constructions with certain restrictions on the sandwich-matrices. We develop a new Grobner-Shirshov algorithm which applies to a larger variety of constructions based on combinatorial Rees matrix semigroups without any restrictions on the sandwich-matrices.
An approach for Ewing test selection to support the clinical assessment of cardiac autonomic neuropathy
- Authors: Stranieri, Andrew , Abawajy, Jemal , Kelarev, Andrei , Huda, Shamsul , Chowdhury, Morshed , Jelinek, Herbert
- Date: 2013
- Type: Text , Journal article
- Relation: Artificial Intelligence in Medicine Vol. 58, no. 3 (2013), p. 185-193
- Full Text:
- Reviewed:
- Description: Objective: This article addresses the problem of determining optimal sequences of tests for the clinical assessment of cardiac autonomic neuropathy (CAN) We investigate the accuracy of using only one of the recommended Ewing tests to classify CAN and the additional accuracy obtained by adding the remaining tests of the Ewing battery This is important as not all five Ewing tests can always be applied in each situation in practice Methods and material: We used new and unique database of the diabetes screening research initiative project, which is more than ten times larger than the data set used by Ewing in his original investigation of CAN We utilized decision trees and the optimal decision path finder (ODPF) procedure for identifying optimal sequences of tests Results: We present experimental results on the accuracy of using each one of the recommended Ewing tests to classify CAN and the additional accuracy that can be achieved by adding the remaining tests of the Ewing battery We found the best sequences of tests for cost-function equal to the number of tests The accuracies achieved by the initial segments of the optimal sequences for 2, 3 and 4 categories of CAN are 80.80, 91.33, 93.97 and 94.14, and respectively, 79.86, 89.29, 91.16 and 91.76, and 78.90, 86.21, 88.15 and 88.93 They show significant improvement compared to the sequence considered previously in the literature and the mathematical expectations of the accuracies of a random sequence of tests The complete outcomes obtained for all subsets of the Ewing features are required for determining optimal sequences of tests for any cost-function with the use of the ODPF procedure We have also found two most significant additional features that can increase the accuracy when some of the Ewing attributes cannot be obtained Conclusions: The outcomes obtained can be used to determine the optimal sequences of tests for each individual cost-function by following the ODPF procedure The results show that the best single Ewing test for diagnosing CAN is the deep breathing heart rate variation test Optimal sequences found for the cost-function equal to the number of tests guarantee that the best accuracy is achieved after any number of tests and provide an improvement in comparison with the previous ordering of tests or a random sequence © 2013 Elsevier B.V.
- Description: 2003011130
Empirical investigation of multi-tier ensembles for the detection of cardiac autonomic neuropathy using subsets of the Ewing features
- Authors: Abawajy, Jemal , Kelarev, Andrei , Stranieri, Andrew , Jelinek, Herbert
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: This article is devoted to an empirical investigation of performance of several new large multi-tier ensembles for the detection of cardiac autonomic neuropathy (CAN) in diabetes patients using sub-sets of the Ewing features. We used new data collected by the diabetes screening research initiative (DiScRi) project, which is more than ten times larger than the data set originally used by Ewing in the investigation of CAN. The results show that new multi-tier ensembles achieved better performance compared with the outcomes published in the literature previously. The best accuracy 97.74% of the detection of CAN has been achieved by the novel multi-tier combination of AdaBoost and Bagging, where AdaBoost is used at the top tier and Bagging is used at the middle tier, for the set consisting of the following four Ewing features: the deep breathing heart rate change, the Valsalva manoeuvre heart rate change, the hand grip blood pressure change and the lying to standing blood pressure change.
Optimization and matrix constructions for classification of data
- Authors: Kelarev, Andrei , Yearwood, John , Vamplew, Peter , Abawajy, Jemal , Chowdhury, Morshed
- Date: 2011
- Type: Journal article
- Relation: New Zealand Journal of Mathematics Vol. 41, no. 2011 (2011), p. 65-73
- Full Text:
- Reviewed:
- Description: Max-plus alegbras and more general semirings have many useful applications and have been actively investigated. On the other hand, structural matrix rings are also well known and have been considered by many authors. The main theorem of this article completely describes all optimal ideas in the more general structural matrix semirings. Originally, our investigation of these ideals was motivated by applications in data mining for the design of multiple classification systems combining several individual classifiers.
Empirical investigation of decision tree ensembles for monitoring cardiac complications of diabetes
- Authors: Kelarev, Andrei , Abawajy, Jemal , Stranieri, Andrew , Jelinek, Herbert
- Date: 2013
- Type: Text , Journal article
- Relation: International Journal of Data Warehousing and mining Vol. 9, no. 4 (2013), p. 1-18
- Full Text: false
- Reviewed:
- Description: Cardiac complications of diabetes require continuous monitoring since they may lead to increased morbidity or sudden death of patients. In order to monitor clinical complications of diabetes using wearable sensors, a small set of features have to be identified and effective algorithms for their processing need to be investigated. This article focuses on detecting and monitoring cardiac autonomic neuropathy (CAN) in diabetes patients. The authors investigate and compare the effectiveness of classifiers based on the following decision trees: ADTree, J48, NBTree, RandomTree, REPTree, and SimpleCart. The authors perform a thorough study comparing these decision trees as well as several decision tree ensembles created by applying the following ensemble methods: AdaBoost, Bagging, Dagging, Decorate, Grading, MultiBoost, Stacking, and two multi-level combinations of AdaBoost and MultiBoost with Bagging for the processing of data from diabetes patients for pervasive health monitoring of CAN. This paper concentrates on the particular task of applying decision tree ensembles for the detection and monitoring of cardiac autonomic neuropathy using these features. Experimental outcomes presented here show that the authors' application of the decision tree ensembles for the detection and monitoring of CAN in diabetes patients achieved better performance parameters compared with the results obtained previously in the literature.
Classification systems based on combinatorial semigroups
- Authors: Abawajy, Jemal , Kelarev, Andrei
- Date: 2013
- Type: Text , Journal article
- Relation: Semigroup Forum Vol. 86, no. 3 (2013), p. 603-612
- Full Text:
- Reviewed:
- Description: The present article continues the investigation of constructions essential for applications of combinatorial semigroups to the design of multiple classification systems in data mining. Our main theorem gives a complete description of all optimal classification systems defined by one-sided ideals in a construction based on combinatorial Rees matrix semigroups. It strengthens and generalizes previous results, which handled the more narrow case of two-sided ideals. © 2012 Springer Science+Business Media New York.
- Description: 2003011021
A data mining application of the incidence semirings
- Authors: Abawajy, Jemal , Kelarev, Andrei , Yearwood, John , Turville, Christopher
- Date: 2013
- Type: Text , Journal article
- Relation: Houston Journal of Mathematics Vol. 39, no. 4 (2013), p. 1083-1093
- Relation: http://purl.org/au-research/grants/arc/LP0990908
- Full Text: false
- Reviewed:
- Description: This paper is devoted to a combinatorial problem for incidence semirings, which can be viewed as sets of polynomials over graphs, where the edges are the unknowns and the coefficients are taken from a semiring. The construction of incidence rings is very well known and has many useful applications. The present article is devoted to a novel application of the more general incidence semirings. Recent research on data mining has motivated the investigation of the sets of centroids that have largest weights in semiring constructions. These sets are valuable for the design of centroid-based classification systems, or classifiers, as well as for the design of multiple classifiers combining several individual classifiers. Our article gives a complete description of all sets of centroids with the largest weight in incidence semirings.
Improving classifications for cardiac autonomic neuropathy using multi-level ensemble classifiers and feature selection based on random forest
- Authors: Kelarev, Andrei , Stranieri, Andrew , Abawajy, Jemal , Yearwood, John , Jelinek, Herbert
- Date: 2012
- Type: Text , Conference paper
- Relation: Tenth Australasian Data Mining Conference Vol. 134, p. 93-101
- Full Text: false
- Reviewed:
- Description: This paper is devoted to empirical investigation of novel multi-level ensemble meta classifiers for the detection and monitoring of progression of cardiac autonomic neuropathy, CAN, in diabetes patients. Our experiments relied on an extensive database and concentrated on ensembles of ensembles, or multi-level meta classifiers, for the classification of cardiac autonomic neuropathy progression. First, we carried out a thorough investigation comparing the performance of various base classifiers for several known sets of the most essential features in this database and determined that Random Forest significantly and consistently outperforms all other base classifiers in this new application. Second, we used feature selection and ranking implemented in Random Forest. It was able to identify a new set of features, which has turned out better than all other sets considered for this large and well-known database previously. Random Forest remained the very best classifier for the new set of features too. Third, we investigated meta classifiers and new multi-level meta classifiers based on Random Forest, which have improved its performance. The results obtained show that novel multi-level meta classifiers achieved further improvement and obtained new outcomes that are significantly better compared with the outcomes published in the literature previously for cardiac autonomic neuropathy.