Cluster based rule discovery model for enhancement of government's tobacco control strategy
- Huda, Shamsul, Yearwood, John, Borland, Ron
- Authors: Huda, Shamsul , Yearwood, John , Borland, Ron
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Discovery of interesting rules describing the behavioural patterns of smokers' quitting intentions is an important task in the determination of an effective tobacco control strategy. In this paper, we investigate a compact and simplified rule discovery process for predicting smokers' quitting behaviour that can provide feedback to build an scientific evidence-based adaptive tobacco control policy. Standard decision tree (SDT) based rule discovery depends on decision boundaries in the feature space which are orthogonal to the axis of the feature of a particular decision node. This may limit the ability of SDT to learn intermediate concepts for high dimensional large datasets such as tobacco control. In this paper, we propose a cluster based rule discovery model (CRDM) for generation of more compact and simplified rules for the enhancement of tobacco control policy. The clusterbased approach builds conceptual groups from which a set of decision trees (a decision forest) are constructed. Experimental results on the tobacco control data set show that decision rules from the decision forest constructed by CRDM are simpler and can predict smokers' quitting intention more accurately than a single decision tree. © 2010 IEEE.
- Authors: Huda, Shamsul , Yearwood, John , Borland, Ron
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Discovery of interesting rules describing the behavioural patterns of smokers' quitting intentions is an important task in the determination of an effective tobacco control strategy. In this paper, we investigate a compact and simplified rule discovery process for predicting smokers' quitting behaviour that can provide feedback to build an scientific evidence-based adaptive tobacco control policy. Standard decision tree (SDT) based rule discovery depends on decision boundaries in the feature space which are orthogonal to the axis of the feature of a particular decision node. This may limit the ability of SDT to learn intermediate concepts for high dimensional large datasets such as tobacco control. In this paper, we propose a cluster based rule discovery model (CRDM) for generation of more compact and simplified rules for the enhancement of tobacco control policy. The clusterbased approach builds conceptual groups from which a set of decision trees (a decision forest) are constructed. Experimental results on the tobacco control data set show that decision rules from the decision forest constructed by CRDM are simpler and can predict smokers' quitting intention more accurately than a single decision tree. © 2010 IEEE.
Empirical investigation of decision tree ensembles for monitoring cardiac complications of diabetes
- Kelarev, Andrei, Abawajy, Jemal, Stranieri, Andrew, Jelinek, Herbert
- Authors: Kelarev, Andrei , Abawajy, Jemal , Stranieri, Andrew , Jelinek, Herbert
- Date: 2013
- Type: Text , Journal article
- Relation: International Journal of Data Warehousing and mining Vol. 9, no. 4 (2013), p. 1-18
- Full Text: false
- Reviewed:
- Description: Cardiac complications of diabetes require continuous monitoring since they may lead to increased morbidity or sudden death of patients. In order to monitor clinical complications of diabetes using wearable sensors, a small set of features have to be identified and effective algorithms for their processing need to be investigated. This article focuses on detecting and monitoring cardiac autonomic neuropathy (CAN) in diabetes patients. The authors investigate and compare the effectiveness of classifiers based on the following decision trees: ADTree, J48, NBTree, RandomTree, REPTree, and SimpleCart. The authors perform a thorough study comparing these decision trees as well as several decision tree ensembles created by applying the following ensemble methods: AdaBoost, Bagging, Dagging, Decorate, Grading, MultiBoost, Stacking, and two multi-level combinations of AdaBoost and MultiBoost with Bagging for the processing of data from diabetes patients for pervasive health monitoring of CAN. This paper concentrates on the particular task of applying decision tree ensembles for the detection and monitoring of cardiac autonomic neuropathy using these features. Experimental outcomes presented here show that the authors' application of the decision tree ensembles for the detection and monitoring of CAN in diabetes patients achieved better performance parameters compared with the results obtained previously in the literature.
Automatically generating classifier for phishing email prediction
- Ma, Liping, Torney, Rosemary, Watters, Paul, Brown, Simon
- Authors: Ma, Liping , Torney, Rosemary , Watters, Paul , Brown, Simon
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at I-SPAN 2009 - The 10th International Symposium on Pervasive Systems, Algorithms, and Networks, Kaohsiung, Taiwan : 14th-16th December 2009 p. 779-783
- Full Text:
- Description: Phishing is a form of online identity theft that employs both social engineering and technical subterfuge to steal consumers' personal identity data and financial account credentials. Phishing email prediction has drawn a lot of attention from many researchers. According to current anti-phishing research, a classifier generated by decision tree produces the most accurate predictions. However, there appears not to be any open source available to transfer such a decision to an implementable classifier. The work presented in this paper builds a decision tree parser which automatically translates a decision tree into an implementable program language so that the decision is useful in real world applications. Experiment results show that the parser performs as well as the original decision. © 2009 IEEE.
- Description: 2003007989
- Authors: Ma, Liping , Torney, Rosemary , Watters, Paul , Brown, Simon
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at I-SPAN 2009 - The 10th International Symposium on Pervasive Systems, Algorithms, and Networks, Kaohsiung, Taiwan : 14th-16th December 2009 p. 779-783
- Full Text:
- Description: Phishing is a form of online identity theft that employs both social engineering and technical subterfuge to steal consumers' personal identity data and financial account credentials. Phishing email prediction has drawn a lot of attention from many researchers. According to current anti-phishing research, a classifier generated by decision tree produces the most accurate predictions. However, there appears not to be any open source available to transfer such a decision to an implementable classifier. The work presented in this paper builds a decision tree parser which automatically translates a decision tree into an implementable program language so that the decision is useful in real world applications. Experiment results show that the parser performs as well as the original decision. © 2009 IEEE.
- Description: 2003007989
Detecting phishing emails using hybrid features
- Ma, Liping, Ofoghi, Bahadorreza, Watters, Paul, Brown, Simon
- Authors: Ma, Liping , Ofoghi, Bahadorreza , Watters, Paul , Brown, Simon
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing, UIC-ATC '09, Brisbane, Queensland : 7th-9th July 2009 p. 493-497
- Full Text:
- Description: Phishing emails have been used widely in fraud of financial organizations and customers. Phishing email detection has drawn a lot attention for many researchers and malicious detection devices are installed in email servers. However, phishing has become more and more complicated and sophisticated and attack can bypass the filter set by anti-phishing techniques. In this paper, we present a method to build a robust classifier to detect phishing emails using hybrid features and to select features using information gain. We experiment on 10 cross-validations to build an initial classifier which performs well. The experiment also analyses the quality of each feature using information gain and best feature set is selected after a recursive learning process. Experimental result shows the selected features perform as well as the original features. Finally, we test five machine learning algorithms and compare the performance of each. The result shows that decision tree builds the best classifier.
- Description: 2003007857
- Authors: Ma, Liping , Ofoghi, Bahadorreza , Watters, Paul , Brown, Simon
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing, UIC-ATC '09, Brisbane, Queensland : 7th-9th July 2009 p. 493-497
- Full Text:
- Description: Phishing emails have been used widely in fraud of financial organizations and customers. Phishing email detection has drawn a lot attention for many researchers and malicious detection devices are installed in email servers. However, phishing has become more and more complicated and sophisticated and attack can bypass the filter set by anti-phishing techniques. In this paper, we present a method to build a robust classifier to detect phishing emails using hybrid features and to select features using information gain. We experiment on 10 cross-validations to build an initial classifier which performs well. The experiment also analyses the quality of each feature using information gain and best feature set is selected after a recursive learning process. Experimental result shows the selected features perform as well as the original features. Finally, we test five machine learning algorithms and compare the performance of each. The result shows that decision tree builds the best classifier.
- Description: 2003007857
An approach for Ewing test selection to support the clinical assessment of cardiac autonomic neuropathy
- Stranieri, Andrew, Abawajy, Jemal, Kelarev, Andrei, Huda, Shamsul, Chowdhury, Morshed, Jelinek, Herbert
- Authors: Stranieri, Andrew , Abawajy, Jemal , Kelarev, Andrei , Huda, Shamsul , Chowdhury, Morshed , Jelinek, Herbert
- Date: 2013
- Type: Text , Journal article
- Relation: Artificial Intelligence in Medicine Vol. 58, no. 3 (2013), p. 185-193
- Full Text:
- Reviewed:
- Description: Objective: This article addresses the problem of determining optimal sequences of tests for the clinical assessment of cardiac autonomic neuropathy (CAN) We investigate the accuracy of using only one of the recommended Ewing tests to classify CAN and the additional accuracy obtained by adding the remaining tests of the Ewing battery This is important as not all five Ewing tests can always be applied in each situation in practice Methods and material: We used new and unique database of the diabetes screening research initiative project, which is more than ten times larger than the data set used by Ewing in his original investigation of CAN We utilized decision trees and the optimal decision path finder (ODPF) procedure for identifying optimal sequences of tests Results: We present experimental results on the accuracy of using each one of the recommended Ewing tests to classify CAN and the additional accuracy that can be achieved by adding the remaining tests of the Ewing battery We found the best sequences of tests for cost-function equal to the number of tests The accuracies achieved by the initial segments of the optimal sequences for 2, 3 and 4 categories of CAN are 80.80, 91.33, 93.97 and 94.14, and respectively, 79.86, 89.29, 91.16 and 91.76, and 78.90, 86.21, 88.15 and 88.93 They show significant improvement compared to the sequence considered previously in the literature and the mathematical expectations of the accuracies of a random sequence of tests The complete outcomes obtained for all subsets of the Ewing features are required for determining optimal sequences of tests for any cost-function with the use of the ODPF procedure We have also found two most significant additional features that can increase the accuracy when some of the Ewing attributes cannot be obtained Conclusions: The outcomes obtained can be used to determine the optimal sequences of tests for each individual cost-function by following the ODPF procedure The results show that the best single Ewing test for diagnosing CAN is the deep breathing heart rate variation test Optimal sequences found for the cost-function equal to the number of tests guarantee that the best accuracy is achieved after any number of tests and provide an improvement in comparison with the previous ordering of tests or a random sequence © 2013 Elsevier B.V.
- Description: 2003011130
- Authors: Stranieri, Andrew , Abawajy, Jemal , Kelarev, Andrei , Huda, Shamsul , Chowdhury, Morshed , Jelinek, Herbert
- Date: 2013
- Type: Text , Journal article
- Relation: Artificial Intelligence in Medicine Vol. 58, no. 3 (2013), p. 185-193
- Full Text:
- Reviewed:
- Description: Objective: This article addresses the problem of determining optimal sequences of tests for the clinical assessment of cardiac autonomic neuropathy (CAN) We investigate the accuracy of using only one of the recommended Ewing tests to classify CAN and the additional accuracy obtained by adding the remaining tests of the Ewing battery This is important as not all five Ewing tests can always be applied in each situation in practice Methods and material: We used new and unique database of the diabetes screening research initiative project, which is more than ten times larger than the data set used by Ewing in his original investigation of CAN We utilized decision trees and the optimal decision path finder (ODPF) procedure for identifying optimal sequences of tests Results: We present experimental results on the accuracy of using each one of the recommended Ewing tests to classify CAN and the additional accuracy that can be achieved by adding the remaining tests of the Ewing battery We found the best sequences of tests for cost-function equal to the number of tests The accuracies achieved by the initial segments of the optimal sequences for 2, 3 and 4 categories of CAN are 80.80, 91.33, 93.97 and 94.14, and respectively, 79.86, 89.29, 91.16 and 91.76, and 78.90, 86.21, 88.15 and 88.93 They show significant improvement compared to the sequence considered previously in the literature and the mathematical expectations of the accuracies of a random sequence of tests The complete outcomes obtained for all subsets of the Ewing features are required for determining optimal sequences of tests for any cost-function with the use of the ODPF procedure We have also found two most significant additional features that can increase the accuracy when some of the Ewing attributes cannot be obtained Conclusions: The outcomes obtained can be used to determine the optimal sequences of tests for each individual cost-function by following the ODPF procedure The results show that the best single Ewing test for diagnosing CAN is the deep breathing heart rate variation test Optimal sequences found for the cost-function equal to the number of tests guarantee that the best accuracy is achieved after any number of tests and provide an improvement in comparison with the previous ordering of tests or a random sequence © 2013 Elsevier B.V.
- Description: 2003011130
- «
- ‹
- 1
- ›
- »