Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data.
ABSTRACT: Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.
Project description:BACKGROUND: Reverse-engineering approaches such as Bayesian network inference, ordinary differential equations (ODEs) and information theory are widely applied to deriving causal relationships among different elements such as genes, proteins, metabolites, neurons, brain areas and so on, based upon multi-dimensional spatial and temporal data. There are several well-established reverse-engineering approaches to explore causal relationships in a dynamic network, such as ordinary differential equations (ODE), Bayesian networks, information theory and Granger Causality. RESULTS: Here we focused on Granger causality both in the time and frequency domain and in local and global networks, and applied our approach to experimental data (genes and proteins). For a small gene network, Granger causality outperformed all the other three approaches mentioned above. A global protein network of 812 proteins was reconstructed, using a novel approach. The obtained results fitted well with known experimental findings and predicted many experimentally testable results. In addition to interactions in the time domain, interactions in the frequency domain were also recovered. CONCLUSIONS: The results on the proteomic data and gene data confirm that Granger causality is a simple and accurate approach to recover the network structure. Our approach is general and can be easily applied to other types of temporal data.
Project description:Reverse-engineering of biological networks is a central problem in systems biology. The use of intervention data, such as gene knockouts or knockdowns, is typically used for teasing apart causal relationships among genes. Under time or resource constraints, one needs to carefully choose which intervention experiments to carry out. Previous approaches for selecting most informative interventions have largely been focused on discrete Bayesian networks. However, continuous Bayesian networks are of great practical interest, especially in the study of complex biological systems and their quantitative properties. In this work, we present an efficient, information-theoretic active learning algorithm for Gaussian Bayesian networks (GBNs), which serve as important models for gene regulatory networks. In addition to providing linear-algebraic insights unique to GBNs, leading to significant runtime improvements, we demonstrate the effectiveness of our method on data simulated with GBNs and the DREAM4 network inference challenge data sets. Our method generally leads to faster recovery of underlying network structure and faster convergence to final distribution of confidence scores over candidate graph structures using the full data, in comparison to random selection of intervention experiments.
Project description:Studying the impact of genetic variation on gene regulatory networks is essential to understand the biological mechanisms by which genetic variation causes variation in phenotypes. Bayesian networks provide an elegant statistical approach for multi-trait genetic mapping and modelling causal trait relationships. However, inferring Bayesian gene networks from high-dimensional genetics and genomics data is challenging, because the number of possible networks scales super-exponentially with the number of nodes, and the computational cost of conventional Bayesian network inference methods quickly becomes prohibitive. We propose an alternative method to infer high-quality Bayesian gene networks that easily scales to thousands of genes. Our method first reconstructs a node ordering by conducting pairwise causal inference tests between genes, which then allows to infer a Bayesian network via a series of independent variable selection problems, one for each gene. We demonstrate using simulated and real systems genetics data that this results in a Bayesian network with equal, and sometimes better, likelihood than the conventional methods, while having a significantly higher overlap with groundtruth networks and being orders of magnitude faster. Moreover our method allows for a unified false discovery rate control across genes and individual edges, and thus a rigorous and easily interpretable way for tuning the sparsity level of the inferred network. Bayesian network inference using pairwise node ordering is a highly efficient approach for reconstructing gene regulatory networks when prior information for the inclusion of edges exists or can be inferred from the available data.
Project description:The development of chemical reaction models aids understanding and prediction in areas ranging from biology to electrochemistry and combustion. A systematic approach to building reaction network models uses observational data not only to estimate unknown parameters but also to learn model structure. Bayesian inference provides a natural approach to this data-driven construction of models. Yet traditional Bayesian model inference methodologies that numerically evaluate the evidence for each model are often infeasible for nonlinear reaction network inference, as the number of plausible models can be combinatorially large. Alternative approaches based on model-space sampling can enable large-scale network inference, but their realization presents many challenges. In this paper, we present new computational methods that make large-scale nonlinear network inference tractable. First, we exploit the topology of networks describing potential interactions among chemical species to design improved 'between-model' proposals for reversible-jump Markov chain Monte Carlo. Second, we introduce a sensitivity-based determination of move types which, when combined with network-aware proposals, yields significant additional gains in sampling performance. These algorithms are demonstrated on inference problems drawn from systems biology, with nonlinear differential equation models of species interactions.
Project description:To compare different deep learning architectures for predicting the risk of readmission within 30 days of discharge from the intensive care unit (ICU). The interpretability of attention-based models is leveraged to describe patients-at-risk. Several deep learning architectures making use of attention mechanisms, recurrent layers, neural ordinary differential equations (ODEs), and medical concept embeddings with time-aware attention were trained using publicly available electronic medical record data (MIMIC-III) associated with 45,298 ICU stays for 33,150 patients. Bayesian inference was used to compute the posterior over weights of an attention-based model. Odds ratios associated with an increased risk of readmission were computed for static variables. Diagnoses, procedures, medications, and vital signs were ranked according to the associated risk of readmission. A recurrent neural network, with time dynamics of code embeddings computed by neural ODEs, achieved the highest average precision of 0.331 (AUROC: 0.739, F<sub>1</sub>-Score: 0.372). Predictive accuracy was comparable across neural network architectures. Groups of patients at risk included those suffering from infectious complications, with chronic or progressive conditions, and for whom standard medical care was not suitable. Attention-based networks may be preferable to recurrent networks if an interpretable model is required, at only marginal cost in predictive accuracy.
Project description:Different computational approaches have been examined and compared for inferring network relationships from time-series genomic data on human disease mechanisms under the recent Dialogue on Reverse Engineering Assessment and Methods (DREAM) challenge. Many of these approaches infer all possible relationships among all candidate genes, often resulting in extremely crowded candidate network relationships with many more False Positives than True Positives. To overcome this limitation, we introduce a novel approach, Module Anchored Network Inference (MANI), that constructs networks by analyzing sequentially small adjacent building blocks (modules). Using MANI, we inferred a 7-gene adipogenesis network based on time-series gene expression data during adipocyte differentiation. MANI was also applied to infer two 10-gene networks based on time-course perturbation datasets from DREAM3 and DREAM4 challenges. MANI well inferred and distinguished serial, parallel, and time-dependent gene interactions and network cascades in these applications showing a superior performance to other in silico network inference techniques for discovering and reconstructing gene network relationships.
Project description:Networks are widely used as structural summaries of biochemical systems. Statistical estimation of networks is usually based on linear or discrete models. However, the dynamics of biochemical systems are generally non-linear, suggesting that suitable non-linear formulations may offer gains with respect to causal network inference and aid in associated prediction problems.We present a general framework for network inference and dynamical prediction using time course data that is rooted in non-linear biochemical kinetics. This is achieved by considering a dynamical system based on a chemical reaction graph with associated kinetic parameters. Both the graph and kinetic parameters are treated as unknown; inference is carried out within a Bayesian framework. This allows prediction of dynamical behavior even when the underlying reaction graph itself is unknown or uncertain. Results, based on (i) data simulated from a mechanistic model of mitogen-activated protein kinase signaling and (ii) phosphoproteomic data from cancer cell lines, demonstrate that non-linear formulations can yield gains in causal network inference and permit dynamical prediction and uncertainty quantification in the challenging setting where the reaction graph is unknown.MATLAB R2014a software is available to download from warwick.ac.uk/chrisoates.Supplementary data are available at Bioinformatics online.
Project description:The inference of gene regulatory network (GRN) from gene expression data is an unsolved problem of great importance. This inference has been stated, though not proven, to be underdetermined implying that there could be many equivalent (indistinguishable) solutions. Motivated by this fundamental limitation, we have developed new framework and algorithm, called TRaCE, for the ensemble inference of GRNs. The ensemble corresponds to the inherent uncertainty associated with discriminating direct and indirect gene regulations from steady-state data of gene knock-out (KO) experiments. We applied TRaCE to analyze the inferability of random GRNs and the GRNs of E. coli and yeast from single- and double-gene KO experiments. The results showed that, with the exception of networks with very few edges, GRNs are typically not inferable even when the data are ideal (unbiased and noise-free). Finally, we compared the performance of TRaCE with top performing methods of DREAM4 in silico network inference challenge.
Project description:BACKGROUND:Reverse engineering of gene regulatory networks from time series gene-expression data is a challenging problem, not only because of the vast sets of candidate interactions but also due to the stochastic nature of gene expression. We limit our analysis to nonlinear differential equation based inference methods. In order to avoid the computational cost of large-scale simulations, a two-step Gaussian process interpolation based gradient matching approach has been proposed to solve differential equations approximately. RESULTS:We apply a gradient matching inference approach to a large number of candidate models, including parametric differential equations or their corresponding non-parametric representations, we evaluate the network inference performance under various settings for different inference objectives. We use model averaging, based on the Bayesian Information Criterion (BIC), to combine the different inferences. The performance of different inference approaches is evaluated using area under the precision-recall curves. CONCLUSIONS:We found that parametric methods can provide comparable, and often improved inference compared to non-parametric methods; the latter, however, require no kinetic information and are computationally more efficient.
Project description:Synthetic biology aims to the design or redesign of biological systems. In particular, one possible goal could be the rewiring of the transcription regulation network by exchanging the endogenous promoters. To achieve this objective, we have adapted current methods to the inference of a model based on ordinary differential equations that is able to predict the network response after a major change in its topology. Our procedure utilizes microarray data for training. We have experimentally validated our inferred global regulatory model in Escherichia coli by predicting transcriptomic profiles under new perturbations. We have also tested our methodology in silico by providing accurate predictions of the underlying networks from expression data generated with artificial genomes. In addition, we have shown the predictive power of our methodology by obtaining the gene profile in experimental redesigns of the E. coli genome, where rewiring the transcriptional network by means of knockouts of master regulators or by upregulating transcription factors controlled by different promoters. Our approach is compatible with most network inference methods, allowing to explore computationally future genome-wide redesign experiments in synthetic biology.