Multimodal memetic framework for low-resolution protein structure prediction
- Authors: Nazmul, Rumana , Chetty, Madhu , Chowdhury, Ashan
- Date: 2020
- Type: Text , Journal article
- Relation: Swarm and Evolutionary Computation Vol. 52, no. (Feb 2020), p. 14
- Full Text: false
- Reviewed:
- Description: In this paper, we propose a systematic design of evolutionary optimization, namely Multimodal Memetic Framework (MMF), to effectively search the vast complex energy landscape. Our proposed memetic framework is implemented in hierarchical stages with the optimization of each stage performed in parallel in three different states: Exploratory, Exploitative and Central. Each state, with its own set of sub-populations, either explores or exploits by beneficial mixing of potential solutions to direct the search towards a global solution. Instead of implementing identical genetic operators, the proposed approach employs different selection and survival criteria in each state according to their designated task. The Exploratory state employs a knowledge-based initial population generation technique with appropriately tuned genetic operators to guide the search to the "nearest peak". The Exploitative state fine-tunes the individuals representing different regions by applying a building block based local search. Finally, by utilizing the imbibed knowledge from different peaks, the Central state carries out information-exchange among the highly fit solutions for exploring the undiscovered regions. The information exchange employs a novel non-random parental selection technique to distribute the reproduction opportunity intelligently among the individuals for making cross-over more effective. The method has been tested on a set of various benchmark protein sequences for 2D and 3D lattice models. The experimental results demonstrate the superiority of the proposed method over other state-of-the-art algorithms.
An adaptive strategy for assortative mating in genetic algorithm
- Authors: Nazmul, Rumana , Chetty, Madhu
- Date: 2013
- Type: Text , Conference paper
- Relation: 2013 IEEE Congress on Evolutionary Computation p. 2237-2244
- Full Text: false
- Reviewed:
- Description: In any traditional Genetic Algorithm (GA), recombination is a dominant search operator and capable of exploring the search space by sharing genetic information among the individuals in the population. However, a simple application of recombination alone is insufficient to guide convergence to an optimal solution. The selection of parents for recombination operation has a significant role in guiding the evolution towards the optimal solution and also for maintaining genetic diversity to avoid getting trapped in local minima. A non-random mating mimics the mechanism of reproduction in nature and is effective in maintaining diversity in population. This paper proposes a new strategy for selection of mating pairs based on a type of non-random mating called as assortative mating. The proposed mate selection scheme conserves the merits of both positive and negative assortative mating in a controlled manner by allowing mating between individuals having both similar and dissimilar phenotypes. For effective cross-over, it maintains genetic diversity in population by distributing the recombination among dissimilar individuals. Furthermore, it ensures the preservation and propagation of useful genetic information to the later stages of search by the selection of mates having similar phenotypes. Experimental results, using not only the five widely used benchmark functions but also twenty newly developed modified functions, are reported. The results show significant improvements in the convergence characteristics of the proposed mating strategy over existing nonrandom mating techniques.
Clustered memetic algorithm with local heuristics for ab initio protein structure prediction
- Authors: Islam, M. D. , Chetty, Madhu
- Date: 2013
- Type: Text , Journal article
- Relation: IEEE Transactions on Evolutionary Computation Vol. 17, no. 4 (2013), p. 558-576
- Full Text: false
- Reviewed:
- Description: Low-resolution protein models are often used within a hierarchical framework for structure prediction. However, even with these simplified but realistic protein models, the search for the optimal solution remains NP complete. The complexity is further compounded by the multimodal nature of the search space. In this paper, we propose a systematic design of an evolutionary search technique, namely the memetic algorithm (MA), to effectively search the vast search space by exploiting the domain-specific knowledge and taking cognizance of the multimodal nature of the search space. The proposed MA achieves this by incorporating various novel features: 1) a modified fitness function includes two additional terms to account for the hydrophobic and polar nature of the residues; 2) a systematic (rather than random) generation of population automatically prevents an occurrence of invalid conformations; 3) a generalized nonisomorphic encoding scheme implicitly eliminates generation of twins (similar conformations) in the population; 4) the identification of a meme (protein substructures) during optimization from different basins of attraction - a process that is equivalent to implicit applications of threading principles; 5) a clustering of the population corresponds to basins of attraction that allows evolution to overcome the complexity of multimodal search space, thereby avoiding search getting trapped in a local optimum; and 6) a 2-stage framework gathers domain knowledge (i.e., substructures or memes) from different basins of attraction for a combined execution in the second stage. Experiments conducted with different lattice models using known benchmark protein sequences and comparisons carried out with recently reported approaches in this journal show that the proposed algorithm has robustness, speed, accuracy, and superior performance. The approach is generic and can easily be extended for applications to other classes of problems.
Inferring large scale genetic networks with S-System model
- Authors: Chowdhury, Ahsan , Chetty, Madhu , Nguyen, Vinh
- Date: 2013
- Type: Text , Conference paper
- Relation: Genetic and Evolutionary Computation Conference p. 271-278
- Full Text: false
- Reviewed:
- Description: Gene regulatory network (GRN) reconstruction from high-throughput microarray data is an important problem in systems biology. The S-System model, a differential equation based approach, is among the mainstream approaches for modeling GRNs
mDBN: motif based learning of gene regulatory networks using dynamic Bayesian networks
- Authors: Morshed, Nizamul , Chetty, Madhu , Nguyen, Vinh , Caelli, Terry
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO'13, Association for Computing Machinery Inc. (ACM), 2013 p. 279-286
- Full Text: false
- Reviewed:
- Description: Solutions for deriving the most consistent Bayesian gene regulatory network model from given data sets using evolutionary algorithms typically only result in locally optimal solutions.
Rhythmic and sustained oscillations in metabolism and gene expression of Cyanothece sp. ATCC 51142 under constant light
- Authors: Gaudana, Sandeep , Krishnakumar, S. , Alagesan, Swathi , Digmurti, Madhuri , Viswanathan, Ganesh , Chetty, Madhu , Wangikar, Pramod
- Date: 2013
- Type: Text , Journal article
- Relation: Frontiers in Microbiology Vol. 4, no. Article 374 (2013), p. 1-11
- Full Text:
- Reviewed:
- Description: Cyanobacteria, a group of photosynthetic prokaryotes, oscillate between day and night time metabolisms with concomitant oscillations in gene expression in response to light/dark cycles (LD). The oscillations in gene expression have been shown to sustain in constant light (LL) with a free running period of 24 h in a model cyanobacterium Synechococcus elongatus PCC 7942. However, equivalent oscillations in metabolism are not reported under LL in this non-nitrogen fixing cyanobacterium. Here we focus on Cyanothece sp. ATCC 51142, a unicellular, nitrogen-fixing cyanobacterium known to temporally separate the processes of oxygenic photosynthesis and oxygen-sensitive nitrogen fixation. In a recent report, metabolism of Cyanothece 51142 has been shown to oscillate between photosynthetic and respiratory phases under LL with free running periods that are temperature dependent but significantly shorter than the circadian period. Further, the oscillations shift to circadian pattern at moderate cell densities that are concomitant with slower growth rates. Here we take this understanding forward and demonstrate that the ultradian rhythm under LL sustains at much higher cell densities when grown under turbulent regimes that simulate flashing light effect. Our results suggest that the ultradian rhythm in metabolism may be needed to support higher carbon and nitrogen requirements of rapidly growing cells under LL. With a comprehensive Real time PCR based gene expression analysis we account for key regulatory interactions and demonstrate the interplay between clock genes and the genes of key metabolic pathways. Further, we observe that several genes that peak at dusk in Synechococcus peak at dawn in Cyanothece and vice versa. The circadian rhythm of this organism appears to be more robust with peaking of genes in anticipation of the ensuing photosynthetic and respiratory metabolic phases.
Data discretization for dynamic Bayesian network based modeling of genetic networks
- Authors: Nguyen, Vinh , Chetty, Madhu , Coppel, Ross , Wangikar, Pramod
- Date: 2012
- Type: Text , Conference paper
- Relation: Neural Information Processing 19th International Conference p. 298-306
- Full Text: false
- Reviewed:
- Description: Dynamic Bayesian networks (DBN) are widely applied in Systems biology for modeling various biological networks, including gene regulatory networks and metabolic networks. The application of DBN models often requires data discretization. Although various discretization techniques exist, currently there is no consensus on which approach is most suitable. Popular discretization strategies within the bioinformatics community, such as interval and quantile discretization, are likely not optimal. In this paper, we propose a novel approach for data discretization for mutual information based learning of DBN. In this approach, the data are discretized so that the mutual information between parent and child nodes is maximized, subject to a suitable penalty put on the complexity of the discretization. A dynamic programming approach is used to find the optimal discretization threshold for each individual variable. Our approach iteratively learns both the network and the discretization scheme until a locally optimal solution is reached. Tests on real genetic networks confirm the effectiveness of the proposed method.
Local and global algorithms for learning dynamic Bayesian networks
- Authors: Nguyen, Vinh , Chetty, Madhu , Coppel, Ross , Wangikar, Pramod
- Date: 2012
- Type: Text , Conference paper
- Relation: The 12th IEEE International Conference on Data Mining (ICDM 2012) p. 685-694
- Full Text: false
- Reviewed:
- Description: Learning optimal Bayesian networks (BN) from data is NP-hard in general. Nevertheless, certain BN classes with additional topological constraints, such as the dynamic BN (DBN) models, widely applied in specific fields such as systems biology, can be efficiently learned in polynomial time. Such algorithms have been developed for the Bayesian-Dirichlet (BD), Minimum Description Length (MDL), and Mutual Information Test (MIT) scoring metrics. The BD-based algorithm admits a large polynomial bound, hence it is impractical for even modestly sized networks. The MDL-and MIT-based algorithms admit much smaller bounds, but require a very restrictive assumption that all variables have the same cardinality, thus significantly limiting their applicability. In this paper, we first propose an improvement to the MDL-and MIT-based algorithms, dropping the equicardinality constraint, thus significantly enhancing their generality. We also explore local Markov blanket based algorithms for constructing BN in the context of DBN, and show an interesting result: under the faithfulness assumption, the mutual information test based local Markov blanket algorithms yield the same network as learned by the global optimization MIT-based algorithm. Experimental validation on small and large scale genetic networks demonstrates the effectiveness of our proposed approaches.
An improved method to infer gene regulatory network using s-system
- Authors: Chowdhury, Ahsan , Chetty, Madhu
- Date: 2011
- Type: Text , Conference paper
- Relation: IEEE Congress on Evolutionary Computation (IEEE CEC) p. 1012-1019
- Full Text: false
- Reviewed:
- Description: Abstract—Gene Regulatory Network (GRN) plays an important role in the understanding of complex biological systems. In most cases, high throughput microarray gene expression data is used for finding these regulatory relationships among genes. In this paper, we present a novel approach, based on decoupled SSystem model, for reverse engineering GRNs. In the proposed method, the genetic algorithm used for scoring the networks contains several useful features for accurate network inference, namely a Prediction Initialization (PI) algorithm to initialize the individuals, a Flip Operation (FO) for better mating of values and a restricted execution of Hill Climbing Local Search over few individuals. It also includes a novel refinement technique which utilizes the fit solutions of the genetic algorithm for optimizing sensitivity and specificity of the inferred network. Comparative studies and robustness analysis using standard benchmark data set show the superiority of the proposed method.
Information theoretic dynamic Bayesian network approach for reconstructing genetic networks
- Authors: Morshed, Nizamul , Chetty, Madhu
- Date: 2011
- Type: Text , Conference paper
- Relation: Proceedings of the Eleventh IASTED International Conference on Artificial Intelligence and Applications p. 236-243
- Full Text: false
- Reviewed:
- Description: A holistic understanding of genetic interactions, in the post-genomic era, is vital for analysing complex biological systems. In this paper, we present an information theory based novel gene regulatory network inference method using the dynamic Bayesian network (DBN) framework. The proposed approach, with strong theoretical underpinnings, employs mutual information based conditional independence tests to assess the regulatory relationships among genes. The method is flexible, computationally fast and allows a-priori incorporation of biological domain knowledge. We apply it to the analysis of synthetic data as well as Saccharomyces cerevisiae (yeast cell cycle) gene expression data. Performance measures applied to simulation studies show the superior performance of the proposed method
Novel local improvement techniques in clustered memetic algorithm for protein structure prediction
- Authors: Islam, Md Kamrul , Chetty, Madhu , Murshed, Manzur
- Date: 2011
- Type: Text , Conference paper
- Relation: IEEE Congress on Evolutionary Computation (IEEE CEC) p. 1003-1011
- Full Text: false
- Reviewed:
- Description: Evolutionary algorithms (EAs) often fail to find the global optimum due to genetic drift. As the protein structure prediction problem is multimodal having several global optima, EAs empowered with combined application of local and global search e.g., memetic algorithms, can be more effective. This paper introduces two novel local improvement techniques for the clustered memetic algorithm to incorporate both problem specific and search-space specific knowledge to find one of the optimum structures of a hydrophobic-polar protein sequence on lattice models. Experimental results show the superiority of the proposed techniques against existing EAs on benchmark sequences.
Simultaneous learning of instantaneous and time-delayed genetic interactions using novel information theoretic scoring technique
- Authors: Morshed, Nizamul , Chetty, Madhu , Xuan Vinh, Nguyen
- Date: 2011
- Type: Text , Conference paper
- Relation: International Conference on Neural Information Processing p. 248-257
- Full Text: false
- Reviewed:
- Description: Understanding gene interactions is a fundamental question in systems biology. Currently, modeling of gene regulations assumes that genes interact either instantaneously or with time delay. In this paper, we introduce a framework based on the Bayesian Network (BN) formalism that can represent both instantaneous and time-delayed interactions between genes simultaneously. Also, a novel scoring metric having firm mathematical underpinnings is then proposed that, unlike other recent methods, can score both interactions concurrently and takes into account the biological fact that multiple regulators may regulate a gene jointly, rather than in an isolated pair-wise manner. Further, a gene regulatory network inference method employing evolutionary search that makes use of the framework and the scoring metric is also presented. Experiments carried out using synthetic data as well as the well known Saccharomyces cerevisiae gene expression data show the effectiveness of our approach.
Binary-organoid particle swarm optimisation for inferring genetic networks
- Authors: Chanthaphavong, Santi , Chetty, Madhu
- Date: 2010
- Type: Text , Conference paper
- Relation: Evolutionary Computation (CEC), 2010 IEEE Congress
- Full Text: false
- Reviewed:
- Description: A holistic understanding of genetic interactions is crucial in the analysis of complex biological systems. However, due to the dimensionality problem (less samples and large number of genes) of microarray data, obtaining an optimal gene regulatory network is not only difficult but also computationally expensive. In this paper, a Bayesian model for the genetic interactions using the Minimum Description Length as a scoring metric is proposed. For fast optimisation of the network structure, we propose a novel Swarm Intelligence algorithm called Binary-Organoid Particle Swarm (BORG-Swarm). In BORG-Swarm we introduce the concepts of probability threshold vector and particle drift to update particle positions. Experimental studies are carried out using real-life yeast cell cycle dataset. Results indicate that existing binary swarms fail to converge and suffer from long runtimes. In constrast, BORG-Swarm's fast convergence towards the global optimum becomes apparent from results of extensive simulations.
Clustered memetic algorithm for protein structure prediction
- Authors: Islam, M. D. , Chetty, Madhu
- Date: 2010
- Type: Text , Conference paper
- Relation: Evolutionary Computation (CEC), 2010 IEEE Congress
- Full Text:
- Reviewed:
Modelling gene regulatory networks using computational intelligence techniques
- Authors: Ram, Ramesh , Chetty, Madhu
- Date: 2010
- Type: Text , Book chapter
- Relation: Handbook of Research on Computational Methodologies in Gene Regulatory Networks p. 244-265
- Full Text: false
- Reviewed:
- Description: This chapter presents modelling gene regulatory networks (GRNs) using probabilistic causal model and the guided genetic algorithm. The problem of modelling is explained from both a biological and computational perspective. Further, a comprehensive methodology for developing a GRN model is presented where the application of computation intelligence (CI) techniques can be seen to be significantly important in each phase of modelling. An illustrative example of the causal model for GRN modelling is also included and applied to model the yeast cell cycle dataset. The results obtained are compared for providing biological relevance to the findings which thereby underpins the CI based modelling techniques.
Multiclass microarray gene expression classification based on fusion of correlation features
- Authors: Chetty, Girija , Chetty, Madhu
- Date: 2010
- Type: Text , Conference paper
- Relation: Information Fusion (FUSION), 2010 13th Conference
- Full Text: false
- Reviewed:
- Description: In this paper, we propose novel algorithmic models based on fusion of independent and correlated gene features for multiclass microarray gene expression classification. It is possible for genes to get co-expressed via different pathways. Moreover, a gene may or may not be co-active for all samples. In this paper, we approach this problem with a optimal feature selection technique using analysis based on statistical techniques to model the complex interactions between genes. The two different types of correlation modelling techniques based on the cross modal factor analysis (CFA) and canonical correlation analysis (CCA) were examined. The subsequent fusion of CCA/CFA features with principal component analysis (PCA) features at feature-level, and at score-level result in significant enhancement in classification accuracy for different data sets corresponding to multiclass microarray gene expression data.
Pattern recognition in bioinformatics : Girls lose out
- Authors: Ahmad, Shandar , Chetty, Madhu , Schmidt, Bertil
- Date: 2010
- Type: Text , Journal article
- Relation: Pattern Recognition Letter Vol. 31, no. 14 (2010), p. 2071-2072
- Full Text: false
- Reviewed:
- Description: Editorial- With the advent of high speed computers, in-silico studies on biological patterns in recent years have been significantly impacted by the pattern recognition techniques. In this special issue, ‘Pattern Recognition in Bioinformatics’, we present various sophisticated algorithms for a wide range of pattern recognition problems from the world of complex biological systems, whether these are specific sequence signatures – motifs that stand out in discovering its partner – or substructures in an interaction network that determines an organisms’ response to external stimuli. The 12 high-quality articles included in this special issue are essentially based on significant extensions of the selected papers presented at the Third International Conference on Pattern Recognition in Bioinformatics (PRIB 2008) held in Melbourne, Australia. All these selected papers for special issue have again undergone a thorough review by at least three reviewers who are experts in the field. The fresh review process was followed to ensure that the papers met the high standards of scientific and technical merit of the Pattern Recognition Letters journal. The issue is broadly divided into three sections of four papers each, namely (1) Section 1: Interaction Networks and Feature-based Predictions (2) Section 2: Microarray and Transcription Data Analysis (3) Section 3: Sequence Analysis and Motif Discovery
Constraint minimization for efficient modelling of gene regulatory network
- Authors: Ram, Ramesh , Chetty, Madhu , Bulach, Dieter
- Date: 2008
- Type: Text , Conference paper
- Relation: Third IAPR International Conference, PRIB 2008 Melbourne
- Full Text: false
- Reviewed:
- Description: Due to various complexities, as well as noise and high dimensionality, reconstructing a gene regulatory network (GRN) from a high-throughput microarray data becomes computationally intensive.In our earlier work on causal model approach for GRN reconstruction, we had shown the superiority of Markov blanket (MB) algorithm compared to the algorithm using the existing Y and V causal models. In this paper, we show the MB algorithm can be enhanced further by application of the proposed constraint logic minimization (CLM) technique. We describe a framework for minimizing the constraint logic involved (condition independent tests) by exploiting the Markov blanket learning methods developed for a Bayesian network (BN). The constraint relationships are represented in the form of logic using K-map and with the aid of CLM increase the algorithm efficiency and the accuracy. We show improved results by investigations on both the synthetic as well as the real life yeast cell cycle data sets.
DFS based partial pathways in GA for protein structure prediction
- Authors: Hoque, Md Tamjidul , Chetty, Madhu , Lewis, Andrew , Sattar, Abdul
- Date: 2008
- Type: Text , Conference paper
- Relation: Third IAPR International Conference, PRIB 2008
- Full Text:
- Reviewed:
- Description: Nondeterministic conformational search techniques, such as Genetic Algorithms (GAs) are promising for solving protein structure prediction (PSP) problem. The crossover operator of a GA can underpin the formation of potential conformations by exchanging and sharing potential sub-conformations, which is promising for solving PSP. However, the usual nature of an optimum PSP conformation being compact can produce many invalid conformations (by having non-self-avoiding-walk) using crossover. While a crossover-based converging conformation suffers from limited pathways, combining it with depth-first search (DFS) can partially reveal potential pathways. DFS generates random conformations increasingly quickly with increasing length of the protein sequences compared to random-move-only-based conformation generation. Random conformations are frequently applied for maintaining diversity as well as for initialization in many GA variations.
Generating synthetic gene regulatory networks
- Authors: Ram, Ramesh , Chetty, Madhu
- Date: 2008
- Type: Text , Conference paper
- Relation: Third IAPR International Conference, PRIB
- Full Text: false
- Reviewed:
- Description: Reconstructing GRN from microarray dataset is a very challenging problem as these datasets typically have large number of genes and less number of samples. Moreover, the reconstruction task becomes further complicated as there are no suitable synthetic datasets available for validation and evaluation of GRN reconstruction techniques. Synthetic datasets allow validating new techniques and approaches since the underlying mechanisms of the GRNs, generated from these datasets, are completely known. In this paper, we present an approach for synthetically generating gene networks using causal relationships. The synthetic networks can have varying topologies such as small world, random, scale free, or hierarchical topologies based on the well-defined GRN properties. These artificial but realistic GRN networks provide a simulation environment similar to a real-life laboratory microarray experiment. These networks also provide a mechanism for studying the robustness of reconstruction methods to individual and combination of parametric changes such as topology, noise (background and experimental noise) and time delays. Studies involving complicated interactions such as feedback loops, oscillations, bi-stability, dynamic behavior, vertex in-degree changes and number of samples can also be carried out by the proposed synthetic GRN networks.