Capped K-NN Editing in definition lacking environments
- Authors: Stranieri, Andrew , Yatsko, Andrew , Golden, Isaac , Mammadov, Musa , Bagirov, Adil
- Date: 2013
- Type: Text , Journal article
- Relation: Journal of Pattern Recognition Research Vol. 8, no. 1 (2013), p. 39-58
- Full Text: false
- Reviewed:
- Description: While any input may be contributing, imprecise specification of class of data subdivided into classes identifies as rather common a source of noise. The misrepresentation may be characteristic of the data or be caused by forcing of a regression problem into the classification type. Consideration is given to examples of this nature, and an alternative is proposed. In the main part, the approach is based on a well-known technique of data treatment for noise using k-NN. The paper advances an editing technique designed around idea of variable number of authenticating instances. Test runs performed on publicly available and proprietary data demonstrate high retention ability of the new procedure without loss of classification accuracy. Noise reduction methods in a broader classification context are extensively surveyed.
Application of optimisation-based data mining techniques to medical data sets: A comparative analysis
- Authors: Dzalilov, Zari , Bagirov, Adil , Mammadov, Musa
- Date: 2012
- Type: Text , Conference paper
- Relation: IMMM 2102: The Second International Conference on Advances in Information Mining and Management p. 41-46
- Full Text: false
- Reviewed:
- Description: Abstract - Computational methods have become an important tool in the analysis of medical data sets. In this paper, we apply three optimisation-based data mining methods to the following data sets: (i) a cystic fibrosis data set and (ii) a tobacco control data set. Three algorithms used in the analysis of these data sets include: the modified linear least square fit, an optimization based heuristic algorithm for feature selection and an optimization based clustering algorithm. All these methods explore the relationship between features and classes, with the aim of determining contribution of specific features to the class outcome. However, the three algorithms are based on completely different approaches. We apply these methods to solve feature selection and classification problems. We also present comparative analysis of the algorithms using computational results. Results obtained confirm that these algorithms may be effectively applied to the analysis of other (bio)medical data sets
Application of optimisation-based data mining techniques to tobacco control dataset
- Authors: Dzalilov, Zari , Zhang, J , Bagirov, Adil , Mammadov, Musa
- Date: 2010
- Type: Text , Journal article
- Relation: International Journal of Lean Thinking Vol. 1, no. 1 (2010), p. 27-41
- Full Text: false
- Reviewed:
- Description: Tobacco smoking is one of the leading causes of death around the world. Consequently, control of tobacco use is an important global public health issue. Tobacco control may be aided by development of theoretical and methodological frameworks for describing and understanding complex tobacco control systems. Linear regression and logistic regression are currently very popular statistical techniques for modeling and analyzing complex data in tobacco control systems. However, in tobacco markets, numerous interrelated factors nontrivially interact with tobacco control policies, such that policies and control outcomes are nonlinearly related.
Improving Naive Bayes classifier using conditional probabilities
- Authors: Taheri, Sona , Mammadov, Musa , Bagirov, Adil
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Naive Bayes classifier is the simplest among Bayesian Network classifiers. It has shown to be very efficient on a variety of data classification problems. However, the strong assumption that all features are conditionally independent given the class is often violated on many real world applications. Therefore, improvement of the Naive Bayes classifier by alleviating the feature independence assumption has attracted much attention. In this paper, we develop a new version of the Naive Bayes classifier without assuming independence of features. The proposed algorithm approximates the interactions between features by using conditional probabilities. We present results of numerical experiments on several real world data sets, where continuous features are discretized by applying two different methods. These results demonstrate that the proposed algorithm significantly improve the performance of the Naive Bayes classifier, yet at the same time maintains its robustness. © 2011, Australian Computer Society, Inc.
- Description: 2003009505