Comparison of evolutionary algorithms in gene regulatory network model inference.
ABSTRACT: BACKGROUND: The evolution of high throughput technologies that measure gene expression levels has created a data base for inferring GRNs (a process also known as reverse engineering of GRNs). However, the nature of these data has made this process very difficult. At the moment, several methods of discovering qualitative causal relationships between genes with high accuracy from microarray data exist, but large scale quantitative analysis on real biological datasets cannot be performed, to date, as existing approaches are not suitable for real microarray data which are noisy and insufficient. RESULTS: This paper performs an analysis of several existing evolutionary algorithms for quantitative gene regulatory network modelling. The aim is to present the techniques used and offer a comprehensive comparison of approaches, under a common framework. Algorithms are applied to both synthetic and real gene expression data from DNA microarrays, and ability to reproduce biological behaviour, scalability and robustness to noise are assessed and compared. CONCLUSIONS: Presented is a comparison framework for assessment of evolutionary algorithms, used to infer gene regulatory networks. Promising methods are identified and a platform for development of appropriate model formalisms is established.
Project description:Reconstructing gene regulatory networks (GRNs) from gene expression data is a challenging problem. Existing GRN reconstruction algorithms can be broadly divided into model-free and model-based methods. Typically, model-free methods have high accuracy but are computation intensive whereas model-based methods are fast but less accurate. We propose Bayesian Gene Regulation Model Inference (BGRMI), a model-based method for inferring GRNs from time-course gene expression data. BGRMI uses a Bayesian framework to calculate the probability of different models of GRNs and a heuristic search strategy to scan the model space efficiently. Using benchmark datasets, we show that BGRMI has higher/comparable accuracy at a fraction of the computational cost of competing algorithms. Additionally, it can incorporate prior knowledge of potential gene regulation mechanisms and TF hetero-dimerization processes in the GRN reconstruction process. We incorporated existing ChIP-seq data and known protein interactions between TFs in BGRMI as sources of prior knowledge to reconstruct transcription regulatory networks of proliferating and differentiating breast cancer (BC) cells from time-course gene expression data. The reconstructed networks revealed key driver genes of proliferation and differentiation in BC cells. Some of these genes were not previously studied in the context of BC, but may have clinical relevance in BC treatment.
Project description:One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.
Project description:Nonlinear dependence is general in regulation mechanism of gene regulatory networks (GRNs). It is vital to properly measure or test nonlinear dependence from real data for reconstructing GRNs and understanding the complex regulatory mechanisms within the cellular system. A recently developed measurement called the distance correlation (DC) has been shown powerful and computationally effective in nonlinear dependence for many situations. In this work, we incorporate the DC into inferring GRNs from the gene expression data without any underling distribution assumptions. We propose three DC-based GRNs inference algorithms: CLR-DC, MRNET-DC and REL-DC, and then compare them with the mutual information (MI)-based algorithms by analyzing two simulated data: benchmark GRNs from the DREAM challenge and GRNs generated by SynTReN network generator, and an experimentally determined SOS DNA repair network in Escherichia coli. According to both the receiver operator characteristic (ROC) curve and the precision-recall (PR) curve, our proposed algorithms significantly outperform the MI-based algorithms in GRNs inference.
Project description:Reconstruction of gene regulatory networks (GRNs) is of utmost interest and has become a challenge computational problem in system biology. However, every existing inference algorithm from gene expression profiles has its own advantages and disadvantages. In particular, the effectiveness and efficiency of every previous algorithm is not high enough. In this work, we proposed a novel inference algorithm from gene expression data based on differential equation model. In this algorithm, two methods were included for inferring GRNs. Before reconstructing GRNs, singular value decomposition method was used to decompose gene expression data, determine the algorithm solution space, and get all candidate solutions of GRNs. In these generated family of candidate solutions, gravitation field algorithm was modified to infer GRNs, used to optimize the criteria of differential equation model, and search the best network structure result. The proposed algorithm is validated on both the simulated scale-free network and real benchmark gene regulatory network in networks database. Both the Bayesian method and the traditional differential equation model were also used to infer GRNs, and the results were used to compare with the proposed algorithm in our work. And genetic algorithm and simulated annealing were also used to evaluate gravitation field algorithm. The cross-validation results confirmed the effectiveness of our algorithm, which outperforms significantly other previous algorithms.
Project description:The reconstruction of gene regulatory networks (GRNs) from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE)-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM), experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.
Project description:Building accurate gene regulatory networks (GRNs) from high-throughput gene expression data is a long-standing challenge. However, with the emergence of new algorithms combined with the increase of transcriptomic data availability, it is now reachable. To help biologists to investigate gene regulatory relationships, we developed a web-based computational service to build, analyze and visualize GRNs that govern various biological processes. The web server is preloaded with all available Affymetrix GeneChip-based transcriptomic and annotation data from the three model legume species, i.e., Medicago truncatula, Lotus japonicus and Glycine max. Users can also upload their own transcriptomic and transcription factor datasets from any other species/organisms to analyze their in-house experiments. Users are able to select which experiments, genes and algorithms they will consider to perform their GRN analysis. To achieve this flexibility and improve prediction performance, we have implemented multiple mainstream GRN prediction algorithms including co-expression, Graphical Gaussian Models (GGMs), Context Likelihood of Relatedness (CLR), and parallelized versions of TIGRESS and GENIE3. Besides these existing algorithms, we also proposed a parallel Bayesian network learning algorithm, which can infer causal relationships (i.e., directionality of interaction) and scale up to several thousands of genes. Moreover, this web server also provides tools to allow integrative and comparative analysis between predicted GRNs obtained from different algorithms or experiments, as well as comparisons between legume species. The web site is available at http://legumegrn.noble.org.
Project description:Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i) a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii) TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks.
Project description:Missing value imputation is important for microarray data analyses because microarray data with missing values would significantly degrade the performance of the downstream analyses. Although many microarray missing value imputation algorithms have been developed, an objective and comprehensive performance comparison framework is still lacking. To solve this problem, we previously proposed a framework which can perform a comprehensive performance comparison of different existing algorithms. Also the performance of a new algorithm can be evaluated by our performance comparison framework. However, constructing our framework is not an easy task for the interested researchers. To save researchers' time and efforts, here we present an easy-to-use web tool named MVIAeval (Missing Value Imputation Algorithm evaluator) which implements our performance comparison framework.MVIAeval provides a user-friendly interface allowing users to upload the R code of their new algorithm and select (i) the test datasets among 20 benchmark microarray (time series and non-time series) datasets, (ii) the compared algorithms among 12 existing algorithms, (iii) the performance indices from three existing ones, (iv) the comprehensive performance scores from two possible choices, and (v) the number of simulation runs. The comprehensive performance comparison results are then generated and shown as both figures and tables.MVIAeval is a useful tool for researchers to easily conduct a comprehensive and objective performance evaluation of their newly developed missing value imputation algorithm for microarray data or any data which can be represented as a matrix form (e.g. NGS data or proteomics data). Thus, MVIAeval will greatly expedite the progress in the research of missing value imputation algorithms.
Project description:BACKGROUND: Inference of gene-regulatory networks (GRNs) is important for understanding behaviour and potential treatment of biological systems. Knowledge about GRNs gained from transcriptome analysis can be increased by multiple experiments and/or multiple stimuli. Since GRNs are complex and dynamical, appropriate methods and algorithms are needed for constructing models describing these dynamics. Algorithms based on heuristic approaches reduce the effort in parameter identification and computation time. RESULTS: The NetGenerator V2.0 algorithm, a heuristic for network inference, is proposed and described. It automatically generates a system of differential equations modelling structure and dynamics of the network based on time-resolved gene expression data. In contrast to a previous version, the inference considers multi-stimuli multi-experiment data and contains different methods for integrating prior knowledge. The resulting significant changes in the algorithmic procedures are explained in detail. NetGenerator is applied to relevant benchmark examples evaluating the inference for data from experiments with different stimuli. Also, the underlying GRN of chondrogenic differentiation, a real-world multi-stimulus problem, is inferred and analysed. CONCLUSIONS: NetGenerator is able to determine the structure and parameters of GRNs and their dynamics. The new features of the algorithm extend the range of possible experimental set-ups, results and biological interpretations. Based upon benchmarks, the algorithm provides good results in terms of specificity, sensitivity, efficiency and model fit.
Project description:BACKGROUND:Gene regulatory networks (GRNs) can be inferred from both gene expression data and genetic perturbations. Under different conditions, the gene data of the same gene set may be different from each other, which results in different GRNs. Detecting structural difference between GRNs under different conditions is of great significance for understanding gene functions and biological mechanisms. RESULTS:In this paper, we propose a Bayesian Fused algorithm to jointly infer differential structures of GRNs under two different conditions. The algorithm is developed for GRNs modeled with structural equation models (SEMs), which makes it possible to incorporate genetic perturbations into models to improve the inference accuracy, so we name it BFDSEM. Different from the naive approaches that separately infer pair-wise GRNs and identify the difference from the inferred GRNs, we first re-parameterize the two SEMs to form an integrated model that takes full advantage of the two groups of gene data, and then solve the re-parameterized model by developing a novel Bayesian fused prior following the criterion that separate GRNs and differential GRN are both sparse. CONCLUSIONS:Computer simulations are run on synthetic data to compare BFDSEM to two state-of-the-art joint inference algorithms: FSSEM and ReDNet. The results demonstrate that the performance of BFDSEM is comparable to FSSEM, and is generally better than ReDNet. The BFDSEM algorithm is also applied to a real data set of lung cancer and adjacent normal tissues, the yielded normal GRN and differential GRN are consistent with the reported results in previous literatures. An open-source program implementing BFDSEM is freely available in Additional file 1.