Modified global k-means algorithm for clustering in gene expression data sets
- Bagirov, Adil, Mardaneh, Karim
- Authors: Bagirov, Adil , Mardaneh, Karim
- Date: 2006
- Type: Text , Conference paper
- Relation: Paper presented at Intelligent Systems for Bioinformatics 2006, proceedings of the AI 2006 Workshop on Intelligent Systems of Bioinformatics, Hobart, Tasmania : 4th December, 2006
- Full Text:
- Reviewed:
- Description: Clustering in gene expression data sets is a challenging problem. Different algorithms for clustering of genes have been proposed. However due to the large number of genes only a few algorithms can be applied for the clustering of samples. k-means algorithm and its different variations are among those algorithms. But these algorithms in general can converge only to local minima and these local minima are significantly different from global solutions as the number of clusters increases. Over the last several years different approaches have been proposed to improve global search properties of k-means algorithm and its performance on large data sets. One of them is the global k-means algorithm. In this paper we develop a new version of the global k-means algorithm: the modified global k-means algorithm which is effective for solving clustering problems in gene expression data sets. We present preliminary computational results using gene expression data sets which demonstrate that the modified k-means algorithm improves and sometimes significantly results by k-means and global k-means algorithms.
- Description: E1
- Description: 2003001713
- Authors: Bagirov, Adil , Mardaneh, Karim
- Date: 2006
- Type: Text , Conference paper
- Relation: Paper presented at Intelligent Systems for Bioinformatics 2006, proceedings of the AI 2006 Workshop on Intelligent Systems of Bioinformatics, Hobart, Tasmania : 4th December, 2006
- Full Text:
- Reviewed:
- Description: Clustering in gene expression data sets is a challenging problem. Different algorithms for clustering of genes have been proposed. However due to the large number of genes only a few algorithms can be applied for the clustering of samples. k-means algorithm and its different variations are among those algorithms. But these algorithms in general can converge only to local minima and these local minima are significantly different from global solutions as the number of clusters increases. Over the last several years different approaches have been proposed to improve global search properties of k-means algorithm and its performance on large data sets. One of them is the global k-means algorithm. In this paper we develop a new version of the global k-means algorithm: the modified global k-means algorithm which is effective for solving clustering problems in gene expression data sets. We present preliminary computational results using gene expression data sets which demonstrate that the modified k-means algorithm improves and sometimes significantly results by k-means and global k-means algorithms.
- Description: E1
- Description: 2003001713
A semantic approach to boost passage retrieval effectiveness for question answering
- Ofoghi, Bahadorreza, Yearwood, John, Ghosh, Ranadhir
- Authors: Ofoghi, Bahadorreza , Yearwood, John , Ghosh, Ranadhir
- Date: 2006
- Type: Text , Conference paper
- Relation: Paper presented at Computer Science 2006 Twenty-Ninth Australian Computer Science Conference, Hobart : 16th January, 2006 p. 95-101
- Full Text:
- Reviewed:
- Description: In the current state of the rapid growth of information resources and the huge number of requests submitted by users to existing information retrieval systems; recently, Question Answering systems have attracted more attention to meet information needs providing users with more precise and focused retrieval units. As one of the most challenging and important processes of such systems is to retrieve the best related text excerpts with regard to the questions, we propose a novel approach to exploit not only the syntax of the natural language of the questions and texts, but also the semantics relayed beneath them via a semantic question rewriting and passage retrieval task. The semantic structure used to address the surface mismatch of the semantically related passages and queries is FrameNet which is a lexical resource for English constituted based on frame semantics. We have run our proposed approach on a subset of the TREC 2004 factoid questions to retrieve passages containing correct answers from the AQUAINT collection and we have obtained promising results.
- Description: E1
- Description: 2003001803
- Authors: Ofoghi, Bahadorreza , Yearwood, John , Ghosh, Ranadhir
- Date: 2006
- Type: Text , Conference paper
- Relation: Paper presented at Computer Science 2006 Twenty-Ninth Australian Computer Science Conference, Hobart : 16th January, 2006 p. 95-101
- Full Text:
- Reviewed:
- Description: In the current state of the rapid growth of information resources and the huge number of requests submitted by users to existing information retrieval systems; recently, Question Answering systems have attracted more attention to meet information needs providing users with more precise and focused retrieval units. As one of the most challenging and important processes of such systems is to retrieve the best related text excerpts with regard to the questions, we propose a novel approach to exploit not only the syntax of the natural language of the questions and texts, but also the semantics relayed beneath them via a semantic question rewriting and passage retrieval task. The semantic structure used to address the surface mismatch of the semantically related passages and queries is FrameNet which is a lexical resource for English constituted based on frame semantics. We have run our proposed approach on a subset of the TREC 2004 factoid questions to retrieve passages containing correct answers from the AQUAINT collection and we have obtained promising results.
- Description: E1
- Description: 2003001803
Using corpus analysis to inform research into opinion detection in blogs
- Osman, Deanna, Yearwood, John, Vamplew, Peter
- Authors: Osman, Deanna , Yearwood, John , Vamplew, Peter
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Sixth Australasian Data Mining Conference, AusDM 2007, Gold Coast, Queensland, Victoria : 3rd-4th December 2007 p. 65-75
- Full Text:
- Description: Opinion detection research relies on labeled documents for training data, either by assumptions based on the document's origin or by using human assessors to categorise the documents. In recent years, blogs have become a source for opinion identification research (TREC Blog06). This study analyses the part-of-speech proportion and the words used within various corpora, determining key differences and similarities useful when preparing for opinion identification research. The resulting comparisons between the characteristics of the various corpora is detailed and discussed. In particular, opinion bearing and non opinion Blog06 documents were found to display a high level of similarity, indicating that blog documents assessed at the document level cannot be used as training data in opinion identification research.
- Description: 2003004892
- Authors: Osman, Deanna , Yearwood, John , Vamplew, Peter
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Sixth Australasian Data Mining Conference, AusDM 2007, Gold Coast, Queensland, Victoria : 3rd-4th December 2007 p. 65-75
- Full Text:
- Description: Opinion detection research relies on labeled documents for training data, either by assumptions based on the document's origin or by using human assessors to categorise the documents. In recent years, blogs have become a source for opinion identification research (TREC Blog06). This study analyses the part-of-speech proportion and the words used within various corpora, determining key differences and similarities useful when preparing for opinion identification research. The resulting comparisons between the characteristics of the various corpora is detailed and discussed. In particular, opinion bearing and non opinion Blog06 documents were found to display a high level of similarity, indicating that blog documents assessed at the document level cannot be used as training data in opinion identification research.
- Description: 2003004892
Opinion search in web logs
- Osman, Deanna, Yearwood, John
- Authors: Osman, Deanna , Yearwood, John
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Eighteenth Australasian Database Conference, ADC 2007, Ballarat, Victoria : 29th January-2nd February 2007 p. 133-139
- Full Text:
- Description: Web logs(blogs) are a fast growing forum for people of all ages to express their feelings and opinions on topics of interest. The entries are often written in informal language without the structure found in newswire or published articles. One blog entry may contain many topics, these topics may express an opinion or a fact on a particular topic. This research is in contrast to work on opinion detection which has been carried out on more formally authored texts and on segments that are either whole documents or sentences. Whole web logs are divided into topics using a simple text segmentation approach. Similarity scores are used to distinguish where topic changers occur. The results are compared to human-evaluated topic changes and the most accurate algorithm is used in the remainder of the research. Words within each topic-block are allocated weightings depending on their opinion-bearing strength. Two approaches of using these weights, the sum and the maximum, are used to determine whether the topic-block is opinion-bearing or non-opinion-bearing. The opinion-bearing topic-blocks are rated by human evaluators as either opinion-bearing or non-opinion-bearing with precision of 67% for approach A and 70% for approach B. These results are compared with two approaches on published text to identify the difference between web logs and published articles.
- Description: 2003004895
- Authors: Osman, Deanna , Yearwood, John
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Eighteenth Australasian Database Conference, ADC 2007, Ballarat, Victoria : 29th January-2nd February 2007 p. 133-139
- Full Text:
- Description: Web logs(blogs) are a fast growing forum for people of all ages to express their feelings and opinions on topics of interest. The entries are often written in informal language without the structure found in newswire or published articles. One blog entry may contain many topics, these topics may express an opinion or a fact on a particular topic. This research is in contrast to work on opinion detection which has been carried out on more formally authored texts and on segments that are either whole documents or sentences. Whole web logs are divided into topics using a simple text segmentation approach. Similarity scores are used to distinguish where topic changers occur. The results are compared to human-evaluated topic changes and the most accurate algorithm is used in the remainder of the research. Words within each topic-block are allocated weightings depending on their opinion-bearing strength. Two approaches of using these weights, the sum and the maximum, are used to determine whether the topic-block is opinion-bearing or non-opinion-bearing. The opinion-bearing topic-blocks are rated by human evaluators as either opinion-bearing or non-opinion-bearing with precision of 67% for approach A and 70% for approach B. These results are compared with two approaches on published text to identify the difference between web logs and published articles.
- Description: 2003004895
Classification for accuracy and insight : A weighted sum approach
- Quinn, Anthony, Stranieri, Andrew, Yearwood, John
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Sixth Australasian Data Mining Conference, AusDM 2007, Gold Coast, Queensland, Victoria : 3rd-4th December 2007 p. 203-208
- Full Text:
- Description: This research presents a classifier that aims to provide insight into a dataset in addition to achieving classification accuracies comparable to other algorithms. The classifier called, Automated Weighted Sum (AWSum) uses a weighted sum approach where feature values are assigned weights that are summed and compared to a threshold in order to classify an example. Though naive, this approach is scalable, achieves accurate classifications on standard datasets and also provides a degree of insight. By insight we mean that the technique provides an appreciation of the influence a feature value has on class values, relative to each other. AWSum provides a focus on the feature value space that allows the technique to identify feature values and combinations of feature values that are sensitive and important for a classification. This is particularly useful in fields such as medicine where this sort of micro-focus and understanding is critical in classification.
- Description: 2003005504
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Sixth Australasian Data Mining Conference, AusDM 2007, Gold Coast, Queensland, Victoria : 3rd-4th December 2007 p. 203-208
- Full Text:
- Description: This research presents a classifier that aims to provide insight into a dataset in addition to achieving classification accuracies comparable to other algorithms. The classifier called, Automated Weighted Sum (AWSum) uses a weighted sum approach where feature values are assigned weights that are summed and compared to a threshold in order to classify an example. Though naive, this approach is scalable, achieves accurate classifications on standard datasets and also provides a degree of insight. By insight we mean that the technique provides an appreciation of the influence a feature value has on class values, relative to each other. AWSum provides a focus on the feature value space that allows the technique to identify feature values and combinations of feature values that are sensitive and important for a classification. This is particularly useful in fields such as medicine where this sort of micro-focus and understanding is critical in classification.
- Description: 2003005504
- «
- ‹
- 1
- ›
- »