- Title
- Optimization based clustering and classification algorithms in analysis of microarray gene expression data sets
- Creator
- Mardaneh, Karim
- Date
- 2007
- Type
- Text; Thesis; PhD
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/61535
- Identifier
- vital:1046
- Abstract
- Doctor of Philosophy; Bioinformatics and computational biology are relatively new areas that involve the use of different techniques including computer science, informatics, biochemistry, applied math and etc., to solve biological problems. In recent years the development of new molecular genetics technologies, such as DNA microarrays led to the simultaneous measurement of expression levels of thousands and even tens of thousands of genes. Microarray gene expression technology has facilitated the study of genomic structure and investigation of biological systems. Numerical output of this technology is shown as microarray gene expression data sets. These data sets contain a very large number of genes and a relatively small number of samples and their precise analysis requires a robust and suitable computer software. Due to this, only a few existing algorithms are applicable to them, so more efficient methods for solving clustering, gene selection and classification problems of gene expression data sets are required and those methods need to be computationally applicable and less expensive. The aim of this thesis is to develop new algorithms for solving clustering, gene selection and data classification problems on gene expression data sets. Clustering in gene expression data sets is a challenging problem. The increasing use of DNA microarray-based tumour gene expression profiles for cancer diagnosis requires more efficient methods to solve clustering problems of these profiles. Different algorithms for clustering of genes have been proposed, however few algorithms can be applied to the clustering of samples. k-means algorithm, among very few clustering algorithms is applicable to microarray gene expression data sets, however these are not efficient for solving clustering problems when the number of genes is thousands and this algorithm is very sensitive to the choice of a starting point. Additionally, when the number of clusters is relatively large, this algorithm gives local minima which can differ significantly from the global solution. Over the last several years different approaches have been proposed to improve global ii Abstract Abstract search properties of k-means algorithm. One of them is the global k-means algorithm, however this algorithm is not efficient when data are sparse. In this thesis we developed a new version of the global k-means algorithm, the modified global k-means algorithm which is effective for solving clustering problems in gene expression data sets. In a microarray gene expression data set, in many cases only a small fraction of genes are informative whereas most of them are non-informative and make noise. Therefore the development of gene selection algorithms that allow us to remove as many non-informative genes as possible is very important. In this thesis we developed a new overlapping gene selection algorithm. This algorithm is based on calculating overlaps of different genes. It considerably reduces the number of genes and is efficient in finding a subset of informative genes. Over the last decade different approaches have been proposed to solve supervised data classification problems in gene expression data sets. In this thesis we developed a new approach which is based on the so-called max-min separability and is compared with the other approaches. The max-min separability algorithm is an equivalent of piecewise linear separability. An incremental algorithm is presented to compute piecewise linear functions separating two sets. This algorithm is applied along with a special gene selection algorithm. In this thesis, all new algorithms have been tested on 10 publicly available gene expression data sets and our numerical results demonstrate the efficiency of the new algorithms that were developed in the framework of this research
- Rights
- Copyright Karim Mardenah
- Rights
- Open Access
- Rights
- This metadata is freely available under a CCO license
- Subject
- Bioinformatics; Computational biology; Gene expression; Australian Digital Thesis
- Full Text
- Hits: 1298
- Visitors: 1391
- Downloads: 126
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | DS1 | Australian Digital Thesis | 3 MB | Adobe Acrobat PDF | View Details Download |