A classification algorithm that derives weighted sum scores for insight into disease
- Quinn, Anthony, Stranieri, Andrew, Yearwood, John, Hafen, Gaudenz
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Third Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2009), Wellington, New Zealand : Vol. 97, p. 13-17
- Full Text:
- Description: Data mining is often performed with datasets associated with diseases in order to increase insights that can ultimately lead to improved prevention or treatment. Classification algorithms can achieve high levels of predictive accuracy but have limited application for facilitating the insight that leads to deeper understanding of aspects of the disease. This is because the representation of knowledge that arises from classification algorithms is too opaque, too complex or too sparse to facilitate insight. Clustering, association and visualisation approaches enable greater scope for clinicians to be engaged in a way that leads to insight, however predictive accuracy is compromised or non-existent. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classification algorithm that provides accuracy comparable to other techniques whilst providing some insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. Clinicians are very familiar with weighted sum scoring scales so the internal representation is intuitive and easily understood. This paper presents results from the use of the AWSum approach with data from patients suffering from Cystic Fibrosis.
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Third Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2009), Wellington, New Zealand : Vol. 97, p. 13-17
- Full Text:
- Description: Data mining is often performed with datasets associated with diseases in order to increase insights that can ultimately lead to improved prevention or treatment. Classification algorithms can achieve high levels of predictive accuracy but have limited application for facilitating the insight that leads to deeper understanding of aspects of the disease. This is because the representation of knowledge that arises from classification algorithms is too opaque, too complex or too sparse to facilitate insight. Clustering, association and visualisation approaches enable greater scope for clinicians to be engaged in a way that leads to insight, however predictive accuracy is compromised or non-existent. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classification algorithm that provides accuracy comparable to other techniques whilst providing some insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. Clinicians are very familiar with weighted sum scoring scales so the internal representation is intuitive and easily understood. This paper presents results from the use of the AWSum approach with data from patients suffering from Cystic Fibrosis.
A comparison of machine learning algorithms for multilabel classification of CAN
- Kelarev, Andrei, Stranieri, Andrew, Yearwood, John, Jelinek, Herbert
- Authors: Kelarev, Andrei , Stranieri, Andrew , Yearwood, John , Jelinek, Herbert
- Date: 2012
- Type: Text , Journal article
- Relation: Advances in Computer Science and Engineering Vol. 9, no. 1 (2012), p. 1-4
- Full Text:
- Reviewed:
- Description: This article is devoted to the investigation and comparison of several important machine learning algorithms in their ability to obtain multilabel classifications of the stages of cardiac autonomic neuropathy (CAN). Data was collected by the Diabetes Complications Screening Research Initiative at Charles Sturt University. Our experiments have achieved better results than those published previously in the literature for similar CAN identification tasks.
- Authors: Kelarev, Andrei , Stranieri, Andrew , Yearwood, John , Jelinek, Herbert
- Date: 2012
- Type: Text , Journal article
- Relation: Advances in Computer Science and Engineering Vol. 9, no. 1 (2012), p. 1-4
- Full Text:
- Reviewed:
- Description: This article is devoted to the investigation and comparison of several important machine learning algorithms in their ability to obtain multilabel classifications of the stages of cardiac autonomic neuropathy (CAN). Data was collected by the Diabetes Complications Screening Research Initiative at Charles Sturt University. Our experiments have achieved better results than those published previously in the literature for similar CAN identification tasks.
A constraint-based evolutionary learning approach to the expectation maximization for optimal estimation of the hidden Markov model for speech signal modeling
- Huda, Shamsul, Yearwood, John, Togneri, Roberto
- Authors: Huda, Shamsul , Yearwood, John , Togneri, Roberto
- Date: 2009
- Type: Text , Journal article
- Relation: IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics Vol. 39, no. 1 (2009), p. 182-197
- Full Text:
- Reviewed:
- Description: This paper attempts to overcome the tendency of the expectation-maximization (EM) algorithm to locate a local rather than global maximum when applied to estimate the hidden Markov model (HMM) parameters in speech signal modeling. We propose a hybrid algorithm for estimation of the HMM in automatic speech recognition (ASR) using a constraint-based evolutionary algorithm (EA) and EM, the CEL-EM. The novelty of our hybrid algorithm (CEL-EM) is that it is applicable for estimation of the constraint-based models with many constraints and large numbers of parameters (which use EM) like HMM. Two constraint-based versions of the CEL-EM with different fusion strategies have been proposed using a constraint-based EA and the EM for better estimation of HMM in ASR. The first one uses a traditional constraint-handling mechanism of EA. The other version transforms a constrained optimization problem into an unconstrained problem using Lagrange multipliers. Fusion strategies for the CEL-EM use a staged-fusion approach where EM has been plugged with the EA periodically after the execution of EA for a specific period of time to maintain the global sampling capabilities of EA in the hybrid algorithm. A variable initialization approach (VIA) has been proposed using a variable segmentation to provide a better initialization for EA in the CEL-EM. Experimental results on the TIMIT speech corpus show that CEL-EM obtains higher recognition accuracies than the traditional EM algorithm as well as a top-standard EM (VIA-EM, constructed by applying the VIA to EM). © 2008 IEEE.
- Authors: Huda, Shamsul , Yearwood, John , Togneri, Roberto
- Date: 2009
- Type: Text , Journal article
- Relation: IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics Vol. 39, no. 1 (2009), p. 182-197
- Full Text:
- Reviewed:
- Description: This paper attempts to overcome the tendency of the expectation-maximization (EM) algorithm to locate a local rather than global maximum when applied to estimate the hidden Markov model (HMM) parameters in speech signal modeling. We propose a hybrid algorithm for estimation of the HMM in automatic speech recognition (ASR) using a constraint-based evolutionary algorithm (EA) and EM, the CEL-EM. The novelty of our hybrid algorithm (CEL-EM) is that it is applicable for estimation of the constraint-based models with many constraints and large numbers of parameters (which use EM) like HMM. Two constraint-based versions of the CEL-EM with different fusion strategies have been proposed using a constraint-based EA and the EM for better estimation of HMM in ASR. The first one uses a traditional constraint-handling mechanism of EA. The other version transforms a constrained optimization problem into an unconstrained problem using Lagrange multipliers. Fusion strategies for the CEL-EM use a staged-fusion approach where EM has been plugged with the EA periodically after the execution of EA for a specific period of time to maintain the global sampling capabilities of EA in the hybrid algorithm. A variable initialization approach (VIA) has been proposed using a variable segmentation to provide a better initialization for EA in the CEL-EM. Experimental results on the TIMIT speech corpus show that CEL-EM obtains higher recognition accuracies than the traditional EM algorithm as well as a top-standard EM (VIA-EM, constructed by applying the VIA to EM). © 2008 IEEE.
A formal description of ontology change in OWL
- Authors: Avery, John , Yearwood, John
- Date: 2005
- Type: Text , Conference paper
- Relation: Paper presented at the Third International Conference on Information Technology and Applications, ICITA 2005, Sydney : 4th - 7th July, 2005
- Full Text:
- Reviewed:
- Description: There are three main activities involved in managing ontology change. Firstly we need to identify changes, secondly describe these identified changes, and finally describe and handle the ramifications of the changes. In previous work we have presented a language (DOWL) for describing ontology change and in this paper we demonstrate how changes described in this language can be represented in the RDF abstract syntax which enables us to describe the ramifications of a change in a formal manner. This formalism can provide the basis for an automated ontology change management system.
- Description: E1
- Description: 2003001448
- Authors: Avery, John , Yearwood, John
- Date: 2005
- Type: Text , Conference paper
- Relation: Paper presented at the Third International Conference on Information Technology and Applications, ICITA 2005, Sydney : 4th - 7th July, 2005
- Full Text:
- Reviewed:
- Description: There are three main activities involved in managing ontology change. Firstly we need to identify changes, secondly describe these identified changes, and finally describe and handle the ramifications of the changes. In previous work we have presented a language (DOWL) for describing ontology change and in this paper we demonstrate how changes described in this language can be represented in the RDF abstract syntax which enables us to describe the ramifications of a change in a formal manner. This formalism can provide the basis for an automated ontology change management system.
- Description: E1
- Description: 2003001448
A formula for multiple classifiers in data mining based on Brandt semigroups
- Kelarev, Andrei, Yearwood, John, Mammadov, Musa
- Authors: Kelarev, Andrei , Yearwood, John , Mammadov, Musa
- Date: 2009
- Type: Text , Journal article
- Relation: Semigroup Forum Vol. 78, no. 2 (2009), p. 293-309
- Full Text:
- Reviewed:
- Description: A general approach to designing multiple classifiers represents them as a combination of several binary classifiers in order to enable correction of classification errors and increase reliability. This method is explained, for example, in Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques, 2005, Sect. 7.5). The aim of this paper is to investigate representations of this sort based on Brandt semigroups. We give a formula for the maximum number of errors of binary classifiers, which can be corrected by a multiple classifier of this type. Examples show that our formula does not carry over to larger classes of semigroups. © 2008 Springer Science+Business Media, LLC.
- Authors: Kelarev, Andrei , Yearwood, John , Mammadov, Musa
- Date: 2009
- Type: Text , Journal article
- Relation: Semigroup Forum Vol. 78, no. 2 (2009), p. 293-309
- Full Text:
- Reviewed:
- Description: A general approach to designing multiple classifiers represents them as a combination of several binary classifiers in order to enable correction of classification errors and increase reliability. This method is explained, for example, in Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques, 2005, Sect. 7.5). The aim of this paper is to investigate representations of this sort based on Brandt semigroups. We give a formula for the maximum number of errors of binary classifiers, which can be corrected by a multiple classifier of this type. Examples show that our formula does not carry over to larger classes of semigroups. © 2008 Springer Science+Business Media, LLC.
A fully automated CAD system using multi-category feature selection with restricted recombination
- Ghosh, Ranadhir, Ghosh, Moumita, Yearwood, John, Mukherjee, Subhasis
- Authors: Ghosh, Ranadhir , Ghosh, Moumita , Yearwood, John , Mukherjee, Subhasis
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at 6th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2007, Melbourne, Victoria : 11th-13th July 2007 p. 106-111
- Full Text:
- Description: In pattern recognition problems features plays an important role for classification results. It is very important which features are used and how many features are used for the classification process. Most of the real life classification problem uses different category of features. It is desirable to find the optimal combination of features that improves the performance of the classifier. There exists different selection framework that selects the features. Mostly do not incorporate the impact of one category of features on another. Even if they incorporate, they produce conflict between the categories. In this paper we proposed a restricted crossover selection framework which incorporate the impact of different categories on each other, as well as it restricts the search within the category which searching in the global region of the search space. The results obtained by the proposed framework are promising.
- Description: 2003005429
- Authors: Ghosh, Ranadhir , Ghosh, Moumita , Yearwood, John , Mukherjee, Subhasis
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at 6th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2007, Melbourne, Victoria : 11th-13th July 2007 p. 106-111
- Full Text:
- Description: In pattern recognition problems features plays an important role for classification results. It is very important which features are used and how many features are used for the classification process. Most of the real life classification problem uses different category of features. It is desirable to find the optimal combination of features that improves the performance of the classifier. There exists different selection framework that selects the features. Mostly do not incorporate the impact of one category of features on another. Even if they incorporate, they produce conflict between the categories. In this paper we proposed a restricted crossover selection framework which incorporate the impact of different categories on each other, as well as it restricts the search within the category which searching in the global region of the search space. The results obtained by the proposed framework are promising.
- Description: 2003005429
A global optimisation approach to classification in medical diagnosis and prognosis
- Bagirov, Adil, Rubinov, Alex, Yearwood, John, Stranieri, Andrew
- Authors: Bagirov, Adil , Rubinov, Alex , Yearwood, John , Stranieri, Andrew
- Date: 2001
- Type: Text , Conference paper
- Relation: Paper presented at 34th Hawaii International Conference on System Sciences, HICSS-34, Maui, Hawaii, USA : 3rd-6th January 2001
- Full Text:
- Description: In this paper global optimisation-based techniques are studied in order to increase the accuracy of medical diagnosis and prognosis with FNA image data from the Wisconsin Diagnostic and Prognostic Breast Cancer databases. First we discuss the problem of determining the most informative features for the classification of cancerous cases in the databases under consideration. Then we apply a technique based on convex and global optimisation to breast cancer diagnosis. It allows the classification of benign cases and malignant ones and the subsequent diagnosis of patients with very high accuracy. The third application of this technique is a method that calculates centres of clusters to predict when breast cancer is likely to recur in patients for which cancer has been removed. The technique achieves higher accuracy with these databases than reported elsewhere in the literature.
- Description: 2003003950
- Authors: Bagirov, Adil , Rubinov, Alex , Yearwood, John , Stranieri, Andrew
- Date: 2001
- Type: Text , Conference paper
- Relation: Paper presented at 34th Hawaii International Conference on System Sciences, HICSS-34, Maui, Hawaii, USA : 3rd-6th January 2001
- Full Text:
- Description: In this paper global optimisation-based techniques are studied in order to increase the accuracy of medical diagnosis and prognosis with FNA image data from the Wisconsin Diagnostic and Prognostic Breast Cancer databases. First we discuss the problem of determining the most informative features for the classification of cancerous cases in the databases under consideration. Then we apply a technique based on convex and global optimisation to breast cancer diagnosis. It allows the classification of benign cases and malignant ones and the subsequent diagnosis of patients with very high accuracy. The third application of this technique is a method that calculates centres of clusters to predict when breast cancer is likely to recur in patients for which cancer has been removed. The technique achieves higher accuracy with these databases than reported elsewhere in the literature.
- Description: 2003003950
A modular framework for multi category feature selection in digital mammography
- Ghosh, Ranadhir, Ghosh, Moumita, Yearwood, John
- Authors: Ghosh, Ranadhir , Ghosh, Moumita , Yearwood, John
- Date: 2004
- Type: Text , Conference paper
- Relation: Paper presented at ESANN 2004 Proceedings: European Symposium on Artificial Neural Networks, Bruges, Belguim : 28/04/2004 Vol. Elsevier, p. 175-180
- Full Text:
- Reviewed:
- Description: Many existing researches utilized many different approaches for recognition in digital mammography using various ANN classifier-modeling techniques. Different types of feature extraction techniques are also used. It has been observed that, beyond a certain point, the inclusion of additional features leads to a worse rather than better performance. Moreover, the choice of features to represent the patterns affects several aspects of pattern recognition problem such as accuracy, required learning time and necessary number of samples. A common problem with the multi category feature classification is the conflict between the categories. None of the feasible solutions allow simultaneous optimal solution for all categories. In order to find an optimal solutions the searching space can be divided based on individual category in each sub region and finally merging them through decision spport system. In this paper we propose a canonical GA based modular feature selection approach combined with standard MLP.
- Description: E1
- Description: 2003000872
- Authors: Ghosh, Ranadhir , Ghosh, Moumita , Yearwood, John
- Date: 2004
- Type: Text , Conference paper
- Relation: Paper presented at ESANN 2004 Proceedings: European Symposium on Artificial Neural Networks, Bruges, Belguim : 28/04/2004 Vol. Elsevier, p. 175-180
- Full Text:
- Reviewed:
- Description: Many existing researches utilized many different approaches for recognition in digital mammography using various ANN classifier-modeling techniques. Different types of feature extraction techniques are also used. It has been observed that, beyond a certain point, the inclusion of additional features leads to a worse rather than better performance. Moreover, the choice of features to represent the patterns affects several aspects of pattern recognition problem such as accuracy, required learning time and necessary number of samples. A common problem with the multi category feature classification is the conflict between the categories. None of the feasible solutions allow simultaneous optimal solution for all categories. In order to find an optimal solutions the searching space can be divided based on individual category in each sub region and finally merging them through decision spport system. In this paper we propose a canonical GA based modular feature selection approach combined with standard MLP.
- Description: E1
- Description: 2003000872
A new scoring system in Cystic Fibrosis : Statistical tools for database analysis - A preliminary report
- Hafen, Gaudenz, Hurst, Cameron, Yearwood, John, Smith, Julie, Dzalilov, Zari, Robinson, P. J.
- Authors: Hafen, Gaudenz , Hurst, Cameron , Yearwood, John , Smith, Julie , Dzalilov, Zari , Robinson, P. J.
- Date: 2008
- Type: Text , Journal article
- Relation: BMC Medical Informatics and Decision Making Vol. 8, no. 44 (2008), p.1-11
- Full Text:
- Reviewed:
- Description: Background. Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21st century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system. Methods. The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets. Results. (1) Feature selection: CAP has a more effective "modelling" focus than DA. (2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males. Conclusion. Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset. © 2008 Hafen et al; licensee BioMed Central Ltd.
- Authors: Hafen, Gaudenz , Hurst, Cameron , Yearwood, John , Smith, Julie , Dzalilov, Zari , Robinson, P. J.
- Date: 2008
- Type: Text , Journal article
- Relation: BMC Medical Informatics and Decision Making Vol. 8, no. 44 (2008), p.1-11
- Full Text:
- Reviewed:
- Description: Background. Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21st century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system. Methods. The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets. Results. (1) Feature selection: CAP has a more effective "modelling" focus than DA. (2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males. Conclusion. Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset. © 2008 Hafen et al; licensee BioMed Central Ltd.
A new supervised term ranking method for text categorization
- Mammadov, Musa, Yearwood, John, Zhao, Lei
- Authors: Mammadov, Musa , Yearwood, John , Zhao, Lei
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 23rd Australasian Joint Conference on Artificial Intelligence, AI 2010 Vol. 6464 LNAI, p. 102-111
- Full Text:
- Reviewed:
- Description: In text categorization, different supervised term weighting methods have been applied to improve classification performance by weighting terms with respect to different categories, for example, Information Gain, χ2 statistic, and Odds Ratio. From the literature there are three term ranking methods to summarize term weights of different categories for multi-class text categorization. They are Summation, Average, and Maximum methods. In this paper we present a new term ranking method to summarize term weights, i.e. Maximum Gap. Using two different methods of information gain and χ2 statistic, we setup controlled experiments for different term ranking methods. Reuter-21578 text corpus is used as the dataset. Two popular classification algorithms SVM and Boostexter are adopted to evaluate the performance of different term ranking methods. Experimental results show that the new term ranking method performs better. © 2010 Springer-Verlag.
- Authors: Mammadov, Musa , Yearwood, John , Zhao, Lei
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 23rd Australasian Joint Conference on Artificial Intelligence, AI 2010 Vol. 6464 LNAI, p. 102-111
- Full Text:
- Reviewed:
- Description: In text categorization, different supervised term weighting methods have been applied to improve classification performance by weighting terms with respect to different categories, for example, Information Gain, χ2 statistic, and Odds Ratio. From the literature there are three term ranking methods to summarize term weights of different categories for multi-class text categorization. They are Summation, Average, and Maximum methods. In this paper we present a new term ranking method to summarize term weights, i.e. Maximum Gap. Using two different methods of information gain and χ2 statistic, we setup controlled experiments for different term ranking methods. Reuter-21578 text corpus is used as the dataset. Two popular classification algorithms SVM and Boostexter are adopted to evaluate the performance of different term ranking methods. Experimental results show that the new term ranking method performs better. © 2010 Springer-Verlag.
A polynomial ring construction for the classification of data
- Kelarev, Andrei, Yearwood, John, Vamplew, Peter
- Authors: Kelarev, Andrei , Yearwood, John , Vamplew, Peter
- Date: 2009
- Type: Text , Journal article
- Relation: Bulletin of the Australian Mathematical Society Vol. 79, no. 2 (2009), p. 213-225
- Full Text:
- Reviewed:
- Description: Drensky and Lakatos (Lecture Notes in Computer Science, 357 (Springer, Berlin, 1989), pp. 181-188) have established a convenient property of certain ideals in polynomial quotient rings, which can now be used to determine error-correcting capabilities of combined multiple classifiers following a standard approach explained in the well-known monograph by Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques (Elsevier, Amsterdam, 2005)). We strengthen and generalise the result of Drensky and Lakatos by demonstrating that the corresponding nice property remains valid in a much larger variety of constructions and applies to more general types of ideals. Examples show that our theorems do not extend to larger classes of ring constructions and cannot be simplified or generalised.
- Authors: Kelarev, Andrei , Yearwood, John , Vamplew, Peter
- Date: 2009
- Type: Text , Journal article
- Relation: Bulletin of the Australian Mathematical Society Vol. 79, no. 2 (2009), p. 213-225
- Full Text:
- Reviewed:
- Description: Drensky and Lakatos (Lecture Notes in Computer Science, 357 (Springer, Berlin, 1989), pp. 181-188) have established a convenient property of certain ideals in polynomial quotient rings, which can now be used to determine error-correcting capabilities of combined multiple classifiers following a standard approach explained in the well-known monograph by Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques (Elsevier, Amsterdam, 2005)). We strengthen and generalise the result of Drensky and Lakatos by demonstrating that the corresponding nice property remains valid in a much larger variety of constructions and applies to more general types of ideals. Examples show that our theorems do not extend to larger classes of ring constructions and cannot be simplified or generalised.
A semantic approach to boost passage retrieval effectiveness for question answering
- Ofoghi, Bahadorreza, Yearwood, John, Ghosh, Ranadhir
- Authors: Ofoghi, Bahadorreza , Yearwood, John , Ghosh, Ranadhir
- Date: 2006
- Type: Text , Conference paper
- Relation: Paper presented at Computer Science 2006 Twenty-Ninth Australian Computer Science Conference, Hobart : 16th January, 2006 p. 95-101
- Full Text:
- Reviewed:
- Description: In the current state of the rapid growth of information resources and the huge number of requests submitted by users to existing information retrieval systems; recently, Question Answering systems have attracted more attention to meet information needs providing users with more precise and focused retrieval units. As one of the most challenging and important processes of such systems is to retrieve the best related text excerpts with regard to the questions, we propose a novel approach to exploit not only the syntax of the natural language of the questions and texts, but also the semantics relayed beneath them via a semantic question rewriting and passage retrieval task. The semantic structure used to address the surface mismatch of the semantically related passages and queries is FrameNet which is a lexical resource for English constituted based on frame semantics. We have run our proposed approach on a subset of the TREC 2004 factoid questions to retrieve passages containing correct answers from the AQUAINT collection and we have obtained promising results.
- Description: E1
- Description: 2003001803
- Authors: Ofoghi, Bahadorreza , Yearwood, John , Ghosh, Ranadhir
- Date: 2006
- Type: Text , Conference paper
- Relation: Paper presented at Computer Science 2006 Twenty-Ninth Australian Computer Science Conference, Hobart : 16th January, 2006 p. 95-101
- Full Text:
- Reviewed:
- Description: In the current state of the rapid growth of information resources and the huge number of requests submitted by users to existing information retrieval systems; recently, Question Answering systems have attracted more attention to meet information needs providing users with more precise and focused retrieval units. As one of the most challenging and important processes of such systems is to retrieve the best related text excerpts with regard to the questions, we propose a novel approach to exploit not only the syntax of the natural language of the questions and texts, but also the semantics relayed beneath them via a semantic question rewriting and passage retrieval task. The semantic structure used to address the surface mismatch of the semantically related passages and queries is FrameNet which is a lexical resource for English constituted based on frame semantics. We have run our proposed approach on a subset of the TREC 2004 factoid questions to retrieve passages containing correct answers from the AQUAINT collection and we have obtained promising results.
- Description: E1
- Description: 2003001803
A technique for ranking friendship closeness in social networking services
- Sun, Zhaohao, Yearwood, John, Firmin, Sally
- Authors: Sun, Zhaohao , Yearwood, John , Firmin, Sally
- Date: 2013
- Type: Text , Conference paper
- Relation: 24th Australasian Conference on Information Systems (ACIS) p. 1-9
- Full Text:
- Reviewed:
- Description: The concept of friend and friendship are critical to both theoretical and empirical studies of social relations, social media and social networks. Measuring the closeness among friends is a big issue for developing online social networking services (SNS) such as Facebook. This paper will address this issue by proposing a technique for ranking friendship closeness in SNS. The technique consists of an algorithm for ranking need-driven friendship closeness and an algorithm for behaviour-based friendship closeness in online social networking sites. The former is based on Maslow’s hierarchy of needs, while the latter is based on behaviours of users on Facebook and TOPSIS. Examples provided illustrate the viability of the proposed algorithms. The research in this paper shows that ranking friendship closeness will facilitate understanding of needs and behaviours of friends and of friendships in SNS. The proposed approach will facilitate research and development of social media, social commerce, social networks, and SN
- Authors: Sun, Zhaohao , Yearwood, John , Firmin, Sally
- Date: 2013
- Type: Text , Conference paper
- Relation: 24th Australasian Conference on Information Systems (ACIS) p. 1-9
- Full Text:
- Reviewed:
- Description: The concept of friend and friendship are critical to both theoretical and empirical studies of social relations, social media and social networks. Measuring the closeness among friends is a big issue for developing online social networking services (SNS) such as Facebook. This paper will address this issue by proposing a technique for ranking friendship closeness in SNS. The technique consists of an algorithm for ranking need-driven friendship closeness and an algorithm for behaviour-based friendship closeness in online social networking sites. The former is based on Maslow’s hierarchy of needs, while the latter is based on behaviours of users on Facebook and TOPSIS. Examples provided illustrate the viability of the proposed algorithms. The research in this paper shows that ranking friendship closeness will facilitate understanding of needs and behaviours of friends and of friendships in SNS. The proposed approach will facilitate research and development of social media, social commerce, social networks, and SN
A web-based Narrative construction environment
- Yearwood, John, Stranieri, Andrew, Osman, Deanna
- Authors: Yearwood, John , Stranieri, Andrew , Osman, Deanna
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at NILE 2008: 5th International Conference on Narrative and Interactive Learning Environments, Edinburgh, Scotland : 6th-8th August 2008 p. 78-81
- Full Text:
- Description: This paper describes a web-based environment for constructing narrative from story snippets contributed by a community of interest. The underlying model uses an argument based structure to infer the next event in the narrative sequence. The approach makes use of both events and higher level story elements derived from Polti’s dramatic situations. Dramatic situations used are consistent with a theme, and events are generally constrained by the dramatic situation. The narrative generated is a function of the event history, the dramatic situations chosen and the plausible inferences about next events that are contributed by a community of interest in the theme. At this stage, a player’s actions are simulated using a random selection from a set and the implementation of a nonsense filter. Example outputs from the system are provided and discussed.
- Description: 2003006499
- Authors: Yearwood, John , Stranieri, Andrew , Osman, Deanna
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at NILE 2008: 5th International Conference on Narrative and Interactive Learning Environments, Edinburgh, Scotland : 6th-8th August 2008 p. 78-81
- Full Text:
- Description: This paper describes a web-based environment for constructing narrative from story snippets contributed by a community of interest. The underlying model uses an argument based structure to infer the next event in the narrative sequence. The approach makes use of both events and higher level story elements derived from Polti’s dramatic situations. Dramatic situations used are consistent with a theme, and events are generally constrained by the dramatic situation. The narrative generated is a function of the event history, the dramatic situations chosen and the plausible inferences about next events that are contributed by a community of interest in the theme. At this stage, a player’s actions are simulated using a random selection from a set and the implementation of a nonsense filter. Example outputs from the system are provided and discussed.
- Description: 2003006499
Adaptive clustering with feature ranking for DDoS attacks detection
- Zi, Lifang, Yearwood, John, Wu, Xin
- Authors: Zi, Lifang , Yearwood, John , Wu, Xin
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Distributed Denial of Service (DDoS) attacks pose an increasing threat to the current internet. The detection of such attacks plays an important role in maintaining the security of networks. In this paper, we propose a novel adaptive clustering method combined with feature ranking for DDoS attacks detection. First, based on the analysis of network traffic, preliminary variables are selected. Second, the Modified Global K-means algorithm (MGKM) is used as the basic incremental clustering algorithm to identify the cluster structure of the target data. Third, the linear correlation coefficient is used for feature ranking. Lastly, the feature ranking result is used to inform and recalculate the clusters. This adaptive process can make worthwhile adjustments to the working feature vector according to different patterns of DDoS attacks, and can improve the quality of the clusters and the effectiveness of the clustering algorithm. The experimental results demonstrate that our method is effective and adaptive in detecting the separate phases of DDoS attacks. © 2010 IEEE.
- Authors: Zi, Lifang , Yearwood, John , Wu, Xin
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Distributed Denial of Service (DDoS) attacks pose an increasing threat to the current internet. The detection of such attacks plays an important role in maintaining the security of networks. In this paper, we propose a novel adaptive clustering method combined with feature ranking for DDoS attacks detection. First, based on the analysis of network traffic, preliminary variables are selected. Second, the Modified Global K-means algorithm (MGKM) is used as the basic incremental clustering algorithm to identify the cluster structure of the target data. Third, the linear correlation coefficient is used for feature ranking. Lastly, the feature ranking result is used to inform and recalculate the clusters. This adaptive process can make worthwhile adjustments to the working feature vector according to different patterns of DDoS attacks, and can improve the quality of the clusters and the effectiveness of the clustering algorithm. The experimental results demonstrate that our method is effective and adaptive in detecting the separate phases of DDoS attacks. © 2010 IEEE.
An application of consensus clustering for DDoS attacks detection
- Zi, Lifang, Yearwood, John, Kelarev, Andrei
- Authors: Zi, Lifang , Yearwood, John , Kelarev, Andrei
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: The detection of Distributed Denial of Service (DDos) attacks is very important for maintaining the security of networks and the Internet. This paper introduces a novel iterative consensus process based on Hybrid Bipartite Graph Formulation (HGBF) consensus function for DDos attacks detection. First, the features are extracted during feature extraction process based on the analysis of network traffic. Second, several clustering algorithms are applied in combination with the silhouette index to obtain a collection of independent initial clusterings. Third, the HGBF consensus function and silhouette index are used to find an appropriate consensus clustering of the initial clusterings. Fourth, this new consensus clustering is added to the pool of initial clusterings replacing another clustering with the worst Silhouette index. Fifth, the process continues iteratively until the Silhouette index of the resulting consensus clusterings stabilizes. This iterative consensus clustering process can improve the quality of the clusters. The experimental results demonstrate that our iterative consensus process is effective and can be used in practice for detecting the separate phased of DDos attacks.
- Authors: Zi, Lifang , Yearwood, John , Kelarev, Andrei
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: The detection of Distributed Denial of Service (DDos) attacks is very important for maintaining the security of networks and the Internet. This paper introduces a novel iterative consensus process based on Hybrid Bipartite Graph Formulation (HGBF) consensus function for DDos attacks detection. First, the features are extracted during feature extraction process based on the analysis of network traffic. Second, several clustering algorithms are applied in combination with the silhouette index to obtain a collection of independent initial clusterings. Third, the HGBF consensus function and silhouette index are used to find an appropriate consensus clustering of the initial clusterings. Fourth, this new consensus clustering is added to the pool of initial clusterings replacing another clustering with the worst Silhouette index. Fifth, the process continues iteratively until the Silhouette index of the resulting consensus clusterings stabilizes. This iterative consensus clustering process can improve the quality of the clusters. The experimental results demonstrate that our iterative consensus process is effective and can be used in practice for detecting the separate phased of DDos attacks.
An experiment in task decomposition and ensembling for a modular artificial neural network
- Ferguson, Brent, Ghosh, Ranadhir, Yearwood, John
- Authors: Ferguson, Brent , Ghosh, Ranadhir , Yearwood, John
- Date: 2004
- Type: Text , Conference paper
- Relation: Paper presented at Innovations in Applied Artificial Intelligence: 17th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, Ottawa, Canada : 17th May, 2004
- Full Text:
- Reviewed:
- Description: Modular neural networks have the possibility of overcoming common scalability and interference problems experienced by fully connected neural networks when applied to large databases. In this paper we trial an approach to constructing modular ANN's for a very large problem from CEDAR for the classification of handwritten characters. In our approach, we apply progressive task decomposition methods based upon clustering and regression techniques to find modules. We then test methods for combining the modules into ensembles and compare their structural characteristics and classification performance with that of an ANN having a fully connected topology. The results reveal improvements to classification rates as well as network topologies for this problem.
- Description: E1
- Description: 2003000852
- Authors: Ferguson, Brent , Ghosh, Ranadhir , Yearwood, John
- Date: 2004
- Type: Text , Conference paper
- Relation: Paper presented at Innovations in Applied Artificial Intelligence: 17th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, Ottawa, Canada : 17th May, 2004
- Full Text:
- Reviewed:
- Description: Modular neural networks have the possibility of overcoming common scalability and interference problems experienced by fully connected neural networks when applied to large databases. In this paper we trial an approach to constructing modular ANN's for a very large problem from CEDAR for the classification of handwritten characters. In our approach, we apply progressive task decomposition methods based upon clustering and regression techniques to find modules. We then test methods for combining the modules into ensembles and compare their structural characteristics and classification performance with that of an ANN having a fully connected topology. The results reveal improvements to classification rates as well as network topologies for this problem.
- Description: E1
- Description: 2003000852
An interaction framework for scenario-based three dimensional environments
- Macfadyen, Alyx, Stranieri, Andrew, Yearwood, John
- Authors: Macfadyen, Alyx , Stranieri, Andrew , Yearwood, John
- Date: 2006
- Type: Text , Conference paper
- Relation: Paper presented at IE 2006, the 3rd Australasian Conference on Interactive Entertainment, Perth : 4th December, 2006
- Full Text:
- Reviewed:
- Description: Although popular and engaging, three dimensional environments are rarely deployed to depict strong narratives involving complex characters engaged in reasoning. The design of three dimensional environments rich in narrative and character depth can be facilitated with a detailed representation of interactions between characters. However, the representation of interaction in current 3D development environments such as game engines is quite basic. This work advances a scheme for representing interactions that integrates a representation of semantics from linguistics called FrameNet with conceptualizations of drama and narrative by Georges Polti and Joseph Campbell. The resulting interaction frame facilitates the design of 3D environments by providing designers rich, yet standard elements that include spatial and temporal data, with which to represent complex interactions in 3D environments. This has application for the authoring of dynamically generated interactive narrative environments.
- Description: E1
- Description: 2003001839
- Authors: Macfadyen, Alyx , Stranieri, Andrew , Yearwood, John
- Date: 2006
- Type: Text , Conference paper
- Relation: Paper presented at IE 2006, the 3rd Australasian Conference on Interactive Entertainment, Perth : 4th December, 2006
- Full Text:
- Reviewed:
- Description: Although popular and engaging, three dimensional environments are rarely deployed to depict strong narratives involving complex characters engaged in reasoning. The design of three dimensional environments rich in narrative and character depth can be facilitated with a detailed representation of interactions between characters. However, the representation of interaction in current 3D development environments such as game engines is quite basic. This work advances a scheme for representing interactions that integrates a representation of semantics from linguistics called FrameNet with conceptualizations of drama and narrative by Georges Polti and Joseph Campbell. The resulting interaction frame facilitates the design of 3D environments by providing designers rich, yet standard elements that include spatial and temporal data, with which to represent complex interactions in 3D environments. This has application for the authoring of dynamically generated interactive narrative environments.
- Description: E1
- Description: 2003001839
Application of rank correlation, clustering and classification in information security
- Beliakov, Gleb, Yearwood, John, Kelarev, Andrei
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2012
- Type: Text , Journal article
- Relation: Journal of Networks Vol. 7, no. 6 (2012), p. 935-945
- Full Text:
- Reviewed:
- Description: This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman-Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms. © 2012 Academy Publisher.
- Description: 2003010277
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2012
- Type: Text , Journal article
- Relation: Journal of Networks Vol. 7, no. 6 (2012), p. 935-945
- Full Text:
- Reviewed:
- Description: This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman-Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms. © 2012 Academy Publisher.
- Description: 2003010277
Applying anatomical therapeutic chemical (ATC) and critical term ontologies to Australian drug safety data for association rules and adverse event signalling
- Saunders, Gary, Ivkovic, Sasha, Ghosh, Ranadhir, Yearwood, John
- Authors: Saunders, Gary , Ivkovic, Sasha , Ghosh, Ranadhir , Yearwood, John
- Date: 2005
- Type: Text , Journal article
- Relation: Conferences in Research and Practice in Information Technology, Advances in Ontologies 2005: Proceedings of the Australasian Ontology Workshop AOW 2005 Vol. 58, no. (2005), p. 93-98
- Full Text:
- Reviewed:
- Description: C1
- Description: 2003001450
- Authors: Saunders, Gary , Ivkovic, Sasha , Ghosh, Ranadhir , Yearwood, John
- Date: 2005
- Type: Text , Journal article
- Relation: Conferences in Research and Practice in Information Technology, Advances in Ontologies 2005: Proceedings of the Australasian Ontology Workshop AOW 2005 Vol. 58, no. (2005), p. 93-98
- Full Text:
- Reviewed:
- Description: C1
- Description: 2003001450