New gene selection algorithm using hypeboxes to improve performance of classifiers
- Authors: Bagirov, Adil , Mardaneh, Karim
- Date: 2020
- Type: Text , Journal article
- Relation: International Journal of Bioinformatics Research and Applications Vol. 16, no. 3 (2020), p. 269-289
- Full Text: false
- Reviewed:
- Description: The use of DNA microarray technology allows to measure the expression levels of thousands of genes in one single experiment which makes possible to apply classification techniques to classify tumours. However, the large number of genes and relatively small number of tumours in gene expression datasets may (and in some cases significantly) diminish the accuracy of many classifiers. Therefore, efficient gene selection algorithms are required to identify most informative genes or groups of genes to improve the performance of classifiers. In this paper, a new gene selection algorithm is developed using marginal hyberboxes of genes or groups of genes for each tumour type. Informative genes are defined using overlaps between hyberboxes. The results on six gene expression datasets demonstrate that the proposed algorithm is able to considerably reduce the number of genes and significantly improve the performance of classifiers. © 2020 Inderscience Enterprises Ltd.
DRfit : A Java tool for the analysis of discrete data from multi-well plate assays
- Authors: Hofmann, Andreas , Preston, Sarah , Cross, Megan , Herath, Dilrukshi , Simon, Anne , Gasser, Robin
- Date: 2019
- Type: Text , Journal article
- Relation: BMC Bioinformatics Vol. 20, no. (2019), p. 1-6
- Full Text:
- Reviewed:
- Description: Background: Analyses of replicates in sets of discrete data, typically acquired in multi-well plate formats, is a recurring task in many contemporary areas in the Life Sciences. The availability of accessible cross-platform data analysis tools for such fundamental tasks in varied projects and environments is an important prerequisite to ensuring a reliable and timely turnaround as well as to provide practical analytical tools for student training. Results: We have developed an easy-to-use, interactive software tool for the analysis of multiple data sets comprising replicates of discrete bivariate data points. For each dataset, the software identifies the replicate data points from a defined matrix layout and calculates their means and standard errors. The averaged values are then automatically fitted using either a linear or a logistic dose response function. Conclusions: DRfit is a practical and convenient tool for the analysis of one or multiple sets of discrete data points acquired as replicates from multi-well plate assays. The design of the graphical user interface and the built-in analysis features make it a flexible and useful tool for a wide range of different assays.
A model of the circadian clock in the cyanobacterium Cyanothece sp. ATCC 51142
- Authors: Nguyen, Vinh , Chetty, Madhu , Coppel, Ross , Gaudana, Sandeep , Wangikar, Pramod
- Date: 2013
- Type: Text , Journal article
- Relation: BMC Bioinformatics Vol. 14, no. (Supplement 2) (2013), p. s14-1-s14-9
- Full Text:
- Reviewed:
- Description: Background The over consumption of fossil fuels has led to growing concerns over climate change and global warming. Increasing research activities have been carried out towards alternative viable biofuel sources. Of several different biofuel platforms, cyanobacteria possess great potential, for their ability to accumulate biomass tens of times faster than traditional oilseed crops. The cyanobacterium Cyanothece sp. ATCC 51142 has recently attracted lots of research interest as a model organism for such research. Cyanothece can perform efficiently both photosynthesis and nitrogen fixation within the same cell, and has been recently shown to produce biohydrogen--a byproduct of nitrogen fixation--at very high rates of several folds higher than previously described hydrogen-producing photosynthetic microbes. Since the key enzyme for nitrogen fixation is very sensitive to oxygen produced by photosynthesis, Cyanothece employs a sophisticated temporal separation scheme, where nitrogen fixation occurs at night and photosynthesis at day. At the core of this temporal separation scheme is a robust clocking mechanism, which so far has not been thoroughly studied. Understanding how this circadian clock interacts with and harmonizes global transcription of key cellular processes is one of the keys to realize the inherent potential of this organism. Results In this paper, we employ several state of the art bioinformatics techniques for studying the core circadian clock in Cyanothece sp. ATCC 51142, and its interactions with other key cellular processes. We employ comparative genomics techniques to map the circadian clock genes and genetic interactions from another cyanobacterial species, namely Synechococcus elongatus PCC 7942, of which the circadian clock has been much more thoroughly investigated. Using time series gene expression data for Cyanothece, we employ gene regulatory network reconstruction techniques to learn this network de novo, and compare the reconstructed network against the interactions currently reported in the literature. Next, we build a computational model of the interactions between the core clock and other cellular processes, and show how this model can predict the behaviour of the system under changing environmental conditions. The constructed models significantly advance our understanding of the Cyanothece circadian clock functional mechanisms.
Chemical characterization of MEA degradation in PCC pilot plants operating in Australia
- Authors: Cruickshank, Alicia , Verheyen, Vincent , Adeloju, Samuel , Meuleman, Erik , Chaffee, Alan , Cottrell, Aaron , Feron, Paul
- Date: 2013
- Type: Text , Journal article
- Relation: Energy Procedia Vol. 37, no. (2013), p. 877-882
- Full Text:
- Reviewed:
- Description: An important step towards commercial scale post-combustion CO2 capture from coal-fired power stations is understanding solvent degradation. Laboratory scale trials have identified three main solvent degradation pathways for 30% MEA: oxidative degradation, carbamate polymerization and formation of heat stable salts. This paper probes the semi-volatile organic compounds produced from a single batch of 30% MEA which was used to capture CO2 from a black coal-fired power station (Tarong, Queensland, Australia) for approximately 700 hours, followed by 500 hours at the brown coal-fired power station (Loy Yang, Victoria, Australia). Comparisons are made between the compounds identified in this aged solvent system with MEA degradation reactions described in literature. Most of semi-volatile compounds tentatively identified by GC/MS have previously been reported in laboratory scale degradation trials. Our preliminary results show low levels of degradation products were present in samples after its use in the pilot plant at Tarong (black coal) and consequent 13 months storage, but much higher concentrations were later found in the same solvent during its at use in the pilot plant at Loy Yang Power (brown coal). Further work includes identifying the cause of poor GC/MS repeatability and investigating the relative rates of reactions described in literature. The impact of inorganic anions and dissolved metals on MEA degradation will also be explored.
Incorporating time-delays in S-System model for reverse engineering genetic networks
- Authors: Chowdhury, Ahsan , Chetty, Madhu , Nguyen, Vinh
- Date: 2013
- Type: Text , Journal article
- Relation: BMC Bioinformatics Vol. 14, no. (2013), p. 1-22
- Full Text:
- Reviewed:
- Description: Background In any gene regulatory network (GRN), the complex interactions occurring amongst transcription factors and target genes can be either instantaneous or time-delayed. However, many existing modeling approaches currently applied for inferring GRNs are unable to represent both these interactions simultaneously. As a result, all these approaches cannot detect important interactions of the other type. S-System model, a differential equation based approach which has been increasingly applied for modeling GRNs, also suffers from this limitation. In fact, all S-System based existing modeling approaches have been designed to capture only instantaneous interactions, and are unable to infer time-delayed interactions. Results In this paper, we propose a novel Time-Delayed S-System (TDSS) model which uses a set of delay differential equations to represent the system dynamics. The ability to incorporate time-delay parameters in the proposed S-System model enables simultaneous modeling of both instantaneous and time-delayed interactions. Furthermore, the delay parameters are not limited to just positive integer values (corresponding to time stamps in the data), but can also take fractional values. Moreover, we also propose a new criterion for model evaluation exploiting the sparse and scale-free nature of GRNs to effectively narrow down the search space, which not only reduces the computation time significantly but also improves model accuracy. The evaluation criterion systematically adapts the max-min in-degrees and also systematically balances the effect of network accuracy and complexity during optimization. Conclusion The four well-known performance measures applied to the experimental studies on synthetic networks with various time-delayed regulations clearly demonstrate that the proposed method can capture both instantaneous and delayed interactions correctly with high precision. The experiments carried out on two well-known real-life networks, namely IRMA and SOS DNA repair network in Escherichia coli show a significant improvement compared with other state-of-the-art approaches for GRN modeling.
Gene regulatory network modeling via global optimization of high-order dynamic Bayesian network
- Authors: Nguyen, Vinh , Chetty, Madhu , Coppel, Ross , Wangikar, Pramod
- Date: 2012
- Type: Text , Journal article
- Relation: BMC Bioinformatics Vol. 13, no. 131 (2012), p. 1-16
- Full Text:
- Reviewed:
- Description: Abstract Background Dynamic Bayesian network (DBN) is among the mainstream approaches for modeling various biological networks, including the gene regulatory network (GRN). Most current methods for learning DBN employ either local search such as hill-climbing, or a meta stochastic global optimization framework such as genetic algorithm or simulated annealing, which are only able to locate sub-optimal solutions. Further, current DBN applications have essentially been limited to small sized networks. Results To overcome the above difficulties, we introduce here a deterministic global optimization based DBN approach for reverse engineering genetic networks from time course gene expression data. For such DBN models that consist only of inter time slice arcs, we show that there exists a polynomial time algorithm for learning the globally optimal network structure. The proposed approach, named GlobalMIT+, employs the recently proposed information theoretic scoring metric named mutual information test (MIT). GlobalMIT+ is able to learn high-order time delayed genetic interactions, which are common to most biological systems. Evaluation of the approach using both synthetic and real data sets, including a 733 cyanobacterial gene expression data set, shows significantly improved performance over other techniques. Conclusions Our studies demonstrate that deterministic global optimization approaches can infer large scale genetic networks.
A Markov-blanket-based model for gene regulatory network inference
- Authors: Ram, Ramesh , Chetty, Madhu
- Date: 2011
- Type: Text , Journal article
- Relation: Transactions on Computational Biology and Bioinformatics Vol. 8, no. 2 (2011), p.
- Full Text: false
- Reviewed:
- Description: An efficient two-step Markov blanket method for modeling and inferring complex regulatory networks from large-scale microarray data sets is presented. The inferred gene regulatory network (GRN) is based on the time series gene expression data capturing the underlying gene interactions. For constructing a highly accurate GRN, the proposed method performs: 1) discovery of a gene's Markov Blanket (MB), 2) formulation of a flexible measure to determine the network's quality, 3) efficient searching with the aid of a guided genetic algorithm, and 4) pruning to obtain a minimal set of correct interactions. Investigations are carried out using both synthetic as well as yeast cell cycle gene expression data sets. The realistic synthetic data sets validate the robustness of the method by varying topology, sample size, time delay, noise, vertex in-degree, and the presence of hidden nodes. It is shown that the proposed approach has excellent inferential capabilities and high accuracy even in the presence of noise. The gene network inferred from yeast cell cycle data is investigated for its biological relevance using well-known interactions, sequence analysis, motif patterns, and GO data. Further, novel interactions are predicted for the unknown genes of the network and their influence on other genes is also discussed.
GlobalMIT: learning globally optimal dynamic Bayesian network with the mutual information test criterion
- Authors: Nguyen, Vinh , Chetty, Madhu , Coppel, Ross , Wangikar, Pramod
- Date: 2011
- Type: Text , Journal article
- Relation: Bioinformatics Vol. 27, no. 19 (2011), p.2765-2766
- Full Text: false
- Reviewed:
Twin removal in genetic algorithms for protein structure prediction using low-resolution model
- Authors: Hoque, Md Tamjidul , Chetty, Madhu , Lewis, Andrew , Sattar, Abdul
- Date: 2011
- Type: Text , Journal article
- Relation: IEEE/ACM Transactions on Computational Biology and Bioinformatics Vol. 8, no. 1 (2011), p. 234-245
- Full Text: false
- Reviewed:
- Description: This paper presents the impact of twins and the measures for their removal from the population of genetic algorithm (GA) when applied to effective conformational searching. It is conclusively shown that a twin removal strategy for a GA provides considerably enhanced performance when investigating solutions to complex ab initio protein structure prediction (PSP) problems in low-resolution model. Without twin removal, GA crossover and mutation operations can become ineffectual as generations lose their ability to produce significant differences, which can lead to the solution stalling. The paper relaxes the definition of chromosomal twins in the removal strategy to not only encompass identical, but also highly correlated chromosomes within the GA population, with empirical results consistently exhibiting significant improvements solving PSP problems.
Studies on the structural stability of rabbit prion probed by molecular dyanamics simulations of its wild-type and mutants
- Authors: Zhang, Jiapu
- Date: 2010
- Type: Text , Journal article
- Relation: Journal of Theoretical Biology Vol. 264, no. (2010), p. 119-122
- Full Text: false
- Reviewed:
- Description: Prion diseases are invariabiably fatal and highly infectious neurodegenerative diseases that affect humans and animals. Rabbits are the only mammalian species reported to be resistant to infection from prion diseases isolated from other species (Vorber et.al., 2003). Fortunately, the NMR structure of rabbit prion (124-228) (PDB entry 2FJ3), the NMR structure of rabbit prion protein mutation s173N (PDB entry 2JOH) and the NMR structure of rabbit prion protein I214V [PDB entry 2JOM} were released recently. This paper studies these NMR structures by molecular dyanmaics simulations. Simulation results confirm the structural ability of wild-type rabbit prion, and show that the salt bridge between D177 and R163 greatly contributes to the structural stability of rabbity prion. Crown Copyright Published by Elsevier.
Extended HP model for protein structure prediction
- Authors: Hoque, Md Tamjidul , Chetty, Madhu , Sattar, Abdul
- Date: 2009
- Type: Text , Journal article
- Relation: Computational Biology and Bioinformatics Vol. Jan-Feb 2011, no. (2009 ), p. 234-245
- Full Text: false
- Reviewed:
- Description: This paper presents the impact of twins and the measures for their removal from the population of genetic algorithm (GA) when applied to effective conformational searching. It is conclusively shown that a twin removal strategy for a GA provides considerably enhanced performance when investigating solutions to complex ab initio protein structure prediction (PSP) problems in low-resolution model. Without twin removal, GA crossover and mutation operations can become ineffectual as generations lose their ability to produce significant differences, which can lead to the solution stalling. The paper relaxes the definition of chromosomal twins in the removal strategy to not only encompass identical, but also highly correlated chromosomes within the GA population, with empirical results consistently exhibiting significant improvements solving PSP problems.
Predicting protein protein interfaces as clusters of optimal docking area points
- Authors: Arafat, Yasir , Kamruzzaman, Joarder , Karmakar, Gour , Fernandez-Recio, Juan
- Date: 2009
- Type: Text , Journal article
- Relation: International Journal of data mining and bioinformatics Vol. 3, no. 1 (2009), p. 55-67
- Full Text: false
- Reviewed:
- Description: Abstract: Desolvation property is used here to predict protein-protein binding sites exploiting the fact that lower-valued 'optimal docking area' ODA (Fernandez-Recio et al., 2005) points form cluster at the interface. The proposed method involves two steps; clustering the ODA points and representing ODA points by average ODA values. On 51 nonredundant proteins, results show the success rate improved considerably. Considering only significant ODA, the previous ODA method has obtained a success rate of 65% with overall success rate of 39%. The proposed method improved the overall success rate to 61%. Further, comparable results were found for X-ray and NMR structures.