Combining segmental semi-Markov models with neural networks for protein secondary structure prediction
- Authors: Bidargaddi, Niranjan , Chetty, Madhu , Kamruzzaman, Joarder
- Date: 2009
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 72, no. 3943-3950 (2009), p.3943-3950
- Full Text: false
- Reviewed:
- Description: Predicting the secondary structure of proteins from a primary sequence alone has been variously approached from either a classification or a generative model perspective. The most prominent classification methods have used neural networks, which involves mappings from a local window of residues in the sequence to the structural state of the central residue in the window, thus capturing the local interactions effectively. However, they fail to capture distant interactions among residues. The generative models based on Bayesian segmentation capture sequence structure relationships using generalized hidden Markov models with explicit state duration. They capture non-local interactions through a joint sequence-structure probability distribution based on structural segments. In this paper, we investigate a combined architecture of Bayesian segmentation at the first stage and neural network at the second stage which captures both local and non-local correlation, to increase the single sequence prediction accuracy. The combined architecture is further enhanced by using neural network optimization and ensemble techniques.
Extended HP model for protein structure prediction
- Authors: Hoque, Md Tamjidul , Chetty, Madhu , Sattar, Abdul
- Date: 2009
- Type: Text , Journal article
- Relation: Computational Biology and Bioinformatics Vol. Jan-Feb 2011, no. (2009 ), p. 234-245
- Full Text: false
- Reviewed:
- Description: This paper presents the impact of twins and the measures for their removal from the population of genetic algorithm (GA) when applied to effective conformational searching. It is conclusively shown that a twin removal strategy for a GA provides considerably enhanced performance when investigating solutions to complex ab initio protein structure prediction (PSP) problems in low-resolution model. Without twin removal, GA crossover and mutation operations can become ineffectual as generations lose their ability to produce significant differences, which can lead to the solution stalling. The paper relaxes the definition of chromosomal twins in the removal strategy to not only encompass identical, but also highly correlated chromosomes within the GA population, with empirical results consistently exhibiting significant improvements solving PSP problems.
GlobalMIT: learning globally optimal dynamic Bayesian network with the mutual information test criterion
- Authors: Nguyen, Vinh , Chetty, Madhu , Coppel, Ross , Wangikar, Pramod
- Date: 2011
- Type: Text , Journal article
- Relation: Bioinformatics Vol. 27, no. 19 (2011), p.2765-2766
- Full Text: false
- Reviewed:
Gene regulatory network modeling via global optimization of high-order dynamic Bayesian network
- Authors: Nguyen, Vinh , Chetty, Madhu , Coppel, Ross , Wangikar, Pramod
- Date: 2012
- Type: Text , Journal article
- Relation: BMC Bioinformatics Vol. 13, no. 131 (2012), p. 1-16
- Full Text:
- Reviewed:
- Description: Abstract Background Dynamic Bayesian network (DBN) is among the mainstream approaches for modeling various biological networks, including the gene regulatory network (GRN). Most current methods for learning DBN employ either local search such as hill-climbing, or a meta stochastic global optimization framework such as genetic algorithm or simulated annealing, which are only able to locate sub-optimal solutions. Further, current DBN applications have essentially been limited to small sized networks. Results To overcome the above difficulties, we introduce here a deterministic global optimization based DBN approach for reverse engineering genetic networks from time course gene expression data. For such DBN models that consist only of inter time slice arcs, we show that there exists a polynomial time algorithm for learning the globally optimal network structure. The proposed approach, named GlobalMIT+, employs the recently proposed information theoretic scoring metric named mutual information test (MIT). GlobalMIT+ is able to learn high-order time delayed genetic interactions, which are common to most biological systems. Evaluation of the approach using both synthetic and real data sets, including a 733 cyanobacterial gene expression data set, shows significantly improved performance over other techniques. Conclusions Our studies demonstrate that deterministic global optimization approaches can infer large scale genetic networks.
A model of the circadian clock in the cyanobacterium Cyanothece sp. ATCC 51142
- Authors: Nguyen, Vinh , Chetty, Madhu , Coppel, Ross , Gaudana, Sandeep , Wangikar, Pramod
- Date: 2013
- Type: Text , Journal article
- Relation: BMC Bioinformatics Vol. 14, no. (Supplement 2) (2013), p. s14-1-s14-9
- Full Text:
- Reviewed:
- Description: Background The over consumption of fossil fuels has led to growing concerns over climate change and global warming. Increasing research activities have been carried out towards alternative viable biofuel sources. Of several different biofuel platforms, cyanobacteria possess great potential, for their ability to accumulate biomass tens of times faster than traditional oilseed crops. The cyanobacterium Cyanothece sp. ATCC 51142 has recently attracted lots of research interest as a model organism for such research. Cyanothece can perform efficiently both photosynthesis and nitrogen fixation within the same cell, and has been recently shown to produce biohydrogen--a byproduct of nitrogen fixation--at very high rates of several folds higher than previously described hydrogen-producing photosynthetic microbes. Since the key enzyme for nitrogen fixation is very sensitive to oxygen produced by photosynthesis, Cyanothece employs a sophisticated temporal separation scheme, where nitrogen fixation occurs at night and photosynthesis at day. At the core of this temporal separation scheme is a robust clocking mechanism, which so far has not been thoroughly studied. Understanding how this circadian clock interacts with and harmonizes global transcription of key cellular processes is one of the keys to realize the inherent potential of this organism. Results In this paper, we employ several state of the art bioinformatics techniques for studying the core circadian clock in Cyanothece sp. ATCC 51142, and its interactions with other key cellular processes. We employ comparative genomics techniques to map the circadian clock genes and genetic interactions from another cyanobacterial species, namely Synechococcus elongatus PCC 7942, of which the circadian clock has been much more thoroughly investigated. Using time series gene expression data for Cyanothece, we employ gene regulatory network reconstruction techniques to learn this network de novo, and compare the reconstructed network against the interactions currently reported in the literature. Next, we build a computational model of the interactions between the core clock and other cellular processes, and show how this model can predict the behaviour of the system under changing environmental conditions. The constructed models significantly advance our understanding of the Cyanothece circadian clock functional mechanisms.
Twin removal in genetic algorithms for protein structure prediction using low-resolution model
- Authors: Hoque, Md Tamjidul , Chetty, Madhu , Lewis, Andrew , Sattar, Abdul
- Date: 2011
- Type: Text , Journal article
- Relation: IEEE/ACM Transactions on Computational Biology and Bioinformatics Vol. 8, no. 1 (2011), p. 234-245
- Full Text: false
- Reviewed:
- Description: This paper presents the impact of twins and the measures for their removal from the population of genetic algorithm (GA) when applied to effective conformational searching. It is conclusively shown that a twin removal strategy for a GA provides considerably enhanced performance when investigating solutions to complex ab initio protein structure prediction (PSP) problems in low-resolution model. Without twin removal, GA crossover and mutation operations can become ineffectual as generations lose their ability to produce significant differences, which can lead to the solution stalling. The paper relaxes the definition of chromosomal twins in the removal strategy to not only encompass identical, but also highly correlated chromosomes within the GA population, with empirical results consistently exhibiting significant improvements solving PSP problems.
A Markov-blanket-based model for gene regulatory network inference
- Authors: Ram, Ramesh , Chetty, Madhu
- Date: 2011
- Type: Text , Journal article
- Relation: Transactions on Computational Biology and Bioinformatics Vol. 8, no. 2 (2011), p.
- Full Text: false
- Reviewed:
- Description: An efficient two-step Markov blanket method for modeling and inferring complex regulatory networks from large-scale microarray data sets is presented. The inferred gene regulatory network (GRN) is based on the time series gene expression data capturing the underlying gene interactions. For constructing a highly accurate GRN, the proposed method performs: 1) discovery of a gene's Markov Blanket (MB), 2) formulation of a flexible measure to determine the network's quality, 3) efficient searching with the aid of a guided genetic algorithm, and 4) pruning to obtain a minimal set of correct interactions. Investigations are carried out using both synthetic as well as yeast cell cycle gene expression data sets. The realistic synthetic data sets validate the robustness of the method by varying topology, sample size, time delay, noise, vertex in-degree, and the presence of hidden nodes. It is shown that the proposed approach has excellent inferential capabilities and high accuracy even in the presence of noise. The gene network inferred from yeast cell cycle data is investigated for its biological relevance using well-known interactions, sequence analysis, motif patterns, and GO data. Further, novel interactions are predicted for the unknown genes of the network and their influence on other genes is also discussed.
Incorporating time-delays in S-System model for reverse engineering genetic networks
- Authors: Chowdhury, Ahsan , Chetty, Madhu , Nguyen, Vinh
- Date: 2013
- Type: Text , Journal article
- Relation: BMC Bioinformatics Vol. 14, no. (2013), p. 1-22
- Full Text:
- Reviewed:
- Description: Background In any gene regulatory network (GRN), the complex interactions occurring amongst transcription factors and target genes can be either instantaneous or time-delayed. However, many existing modeling approaches currently applied for inferring GRNs are unable to represent both these interactions simultaneously. As a result, all these approaches cannot detect important interactions of the other type. S-System model, a differential equation based approach which has been increasingly applied for modeling GRNs, also suffers from this limitation. In fact, all S-System based existing modeling approaches have been designed to capture only instantaneous interactions, and are unable to infer time-delayed interactions. Results In this paper, we propose a novel Time-Delayed S-System (TDSS) model which uses a set of delay differential equations to represent the system dynamics. The ability to incorporate time-delay parameters in the proposed S-System model enables simultaneous modeling of both instantaneous and time-delayed interactions. Furthermore, the delay parameters are not limited to just positive integer values (corresponding to time stamps in the data), but can also take fractional values. Moreover, we also propose a new criterion for model evaluation exploiting the sparse and scale-free nature of GRNs to effectively narrow down the search space, which not only reduces the computation time significantly but also improves model accuracy. The evaluation criterion systematically adapts the max-min in-degrees and also systematically balances the effect of network accuracy and complexity during optimization. Conclusion The four well-known performance measures applied to the experimental studies on synthetic networks with various time-delayed regulations clearly demonstrate that the proposed method can capture both instantaneous and delayed interactions correctly with high precision. The experiments carried out on two well-known real-life networks, namely IRMA and SOS DNA repair network in Escherichia coli show a significant improvement compared with other state-of-the-art approaches for GRN modeling.
Frequency decomposition based gene clustering
- Authors: Rahman, Md Abdur , Chetty, Madhu , Bulach, Dieter , Wangikar, Pramod
- Date: 2015
- Type: Text , Conference paper
- Relation: 22nd International Conference on Neural Information Processing, ICONIP 2015; Istanbul, Turkey; 9th-12th November 2015 Vol. 9490, p. 170-181
- Full Text: false
- Reviewed:
- Description: Gene expressions have been commonly applied to understand the inherent underlying mechanism of known biological processes. Although the microarray gene expressions usually appear aperiodic, with proper signal processing techniques, its periodic components can be easily obtained. Thus, if expressions of interconnected (regulatory and regulated) genes are decomposed, at least one common frequency component will appear in these genes. Exploiting this novel concept, we propose a frequency decomposition approach for gene clustering to better understand the gene interconnection topology. This method, based on Hilbert Huang Transform (HHT) enables us to segregate every periodic component of the gene expressions. Next, a multilevel clustering is performed based on these frequency components. Unlike existing clustering algorithms, the proposed method assimilates a meaningful knowledge of the gene interactions topology. The information related to underlying gene interactions is vital and can prove useful in many existing evolutionary optimisation algorithms for genetic network reconstruction. We validate the entire approach by its application to a 15-gene synthetic network. © Springer International Publishing Switzerland 2015.
Network decomposition based large-scale reverse engineering of gene regulatory network
- Authors: Chowdhury, Ahsan , Chetty, Madhu
- Date: 2015
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 160, no. (2015), p. 213-227
- Full Text: false
- Reviewed:
- Description: A Gene Regulatory Network (GRN) is the functional circuitry of a living organism that exhibits the regulatory relationships among genes of a cellular system at the gene level. In real-life biological networks, the number of genes present are very large exhibiting both, the instantaneous and time-delayed regulations. While our recent technique [1] addresses the modeling of time-delays occurring in genetic interactions, the issue of large-scale GRN modeling still remains. In this paper, we propose a novel methodology for large-scale modeling of GRNs by decomposing the GRN into two independent sub-networks utilizing its biological traits. Using the time-delayed S-system model [1], these two sub-networks are learnt separately and then combined to get the entire GRN. To speed up the inference mechanism, a cardinality-based fitness function, especially developed for inferring large-scale GRNs is proposed to allow incorporation of knowledge of maximum in-degree. A novel local-search method is also proposed to further facilitate the incorporation of biological knowledge by gene clustering and gene ranking. Experimental studies demonstrate that the proposed approach is successful in learning large genetic networks, currently not achievable with existing S-system based modeling approaches.
Exploiting temporal genetic correlations for enhancing regulatory network optimization
- Authors: Youseph, Ahammed , Chetty, Madhu , Karmakar, Gour
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 23rd International Conference on Neural Information Processing, ICONIP 2016; Kyoto, Japan; 16th-21st October 2016; published in Neural Information Processing (Lecture Notes in Computer Science series) Vol. 9947 LNCS, p. 479-487
- Full Text: false
- Reviewed:
- Description: Inferring gene regulatory networks (GRN) from microarray gene expression data is a highly challenging problem in computational and systems biology. To make GRN reconstruction process more accurate and faster, in this paper, we develop a technique to identify the gene having maximum in-degree in the network using the temporal correlation of gene expression profiles. The in-degree of the identified gene is estimated applying evolutionary optimization algorithm on a decoupled S-system GRN model. The value of in-degree thus obtained is set as the maximum in-degree for inference of the regulations in other genes. The simulations are carried out on in silico networks of small and medium sizes. The results show that both the prediction accuracy in terms of well known performance metrics and the computational time of the optimization process have been improved when compared with the traditional S-system model based inference. © Springer International Publishing AG 2016.
- Description: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
An improved memetic approach for protein structure prediction incorporating maximal hydrophobic core estimation concept
- Authors: Nazmul, Rumana , Chetty, Madhu , Chowdhury, Ahsan
- Date: 2021
- Type: Text , Journal article
- Relation: Knowledge-Based Systems Vol. 219, no. (2021), p. 104395
- Full Text: false
- Reviewed:
- Description: Protein Structure Prediction (PSP) from the primary amino acid sequence, even using a simplified Hydrophobic-Polar (HP) lattice model, continues to be extremely challenging. Finding an optimal conformation, even for a small sequence, by any of the currently known evolutionary approaches is computationally extensive and time consuming. Although Memetic Algorithms (MAs) have shown success in finding the optimal solution for PSP, no significant work on the incorporation of domain or problem specific knowledge into the search process to significantly improve their performance is reported. In this paper, we present an approach to incorporate such knowledge into the initial population to enhance the effectiveness of MA for PSP. The domain knowledge we propose to use is based on the concept of maximal ‘core’ formation by exploiting the fundamental property of the H residues to be at the core of the minimum energy optimal protein structure. A generic technique is proposed for estimating the maximal Hydrophobic core (H-core) in a protein sequence for 2D Square, 3D Cubic and a more complex and realistic 3D FCC (Face Centered Cubic) lattice models. Subsequently, the knowledge of this estimated core is incorporated in an MA. The experiments conducted using HP benchmark sequences for 2D Square, 3D Cubic and 3D FCC lattice models show that the proposed MA with the new core-based population initialization technique has superior performance to the existing methods in terms of convergence speed as well as minimal energy. © 2018 Elsevier B.V.