Metabolic network discovery through reverse engineering of metabolome data.
ABSTRACT: Reverse engineering of high-throughput omics data to infer underlying biological networks is one of the challenges in systems biology. However, applications in the field of metabolomics are rather limited. We have focused on a systematic analysis of metabolic network inference from in silico metabolome data based on statistical similarity measures. Three different data types based on biological/environmental variability around steady state were analyzed to compare the relative information content of the data types for inferring the network. Comparing the inference power of different similarity scores indicated the clear superiority of conditioning or pruning based scores as they have the ability to eliminate indirect interactions. We also show that a mathematical measure based on the Fisher information matrix gives clues on the information quality of different data types to better represent the underlying metabolic network topology. Results on several datasets of increasing complexity consistently show that metabolic variations observed at steady state, the simplest experimental analysis, are already informative to reveal the connectivity of the underlying metabolic network with a low false-positive rate when proper similarity-score approaches are employed. For experimental situations this implies that a single organism under slightly varying conditions may already generate more than enough information to rightly infer networks. Detailed examination of the strengths of interactions of the underlying metabolic networks demonstrates that the edges that cannot be captured by similarity scores mainly belong to metabolites connected with weak interaction strength. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-009-0156-4) contains supplementary material, which is available to authorized users.
Project description:Since metabolome data are derived from the underlying metabolic network, reverse engineering of such data to recover the network topology is of wide interest. Lyapunov equation puts a constraint to the link between data and network by coupling the covariance of data with the strength of interactions (Jacobian matrix). This equation, when expressed as a linear set of equations at steady state, constitutes a basis to infer the network structure given the covariance matrix of data. The sparse structure of metabolic networks points to reactions which are active based on minimal enzyme production, hinting at sparsity as a cellular objective. Therefore, for a given covariance matrix, we solved Lyapunov equation to calculate Jacobian matrix by a simultaneous use of minimization of Euclidean norm of residuals and maximization of sparsity (the number of zeros in Jacobian matrix) as objective functions to infer directed small-scale networks from three kingdoms of life (bacteria, fungi, mammalian). The inference performance of the approach was found to be promising, with zero False Positive Rate, and almost one True positive Rate. The effect of missing data on results was additionally analyzed, revealing superiority over similarity-based approaches which infer undirected networks. Our findings suggest that the covariance of metabolome data implies an underlying network with sparsest pattern. The theoretical analysis forms a framework for further investigation of sparsity-based inference of metabolic networks from real metabolome data.
Project description:A main goal in the analysis of a complex system is to infer its underlying network structure from time-series observations of its behaviour. The inference process is often done by using bi-variate similarity measures, such as the cross-correlation (CC) or mutual information (MI), however, the main factors favouring or hindering its success are still puzzling. Here, we use synthetic neuron models in order to reveal the main topological properties that frustrate or facilitate inferring the underlying network from CC measurements. Specifically, we use pulse-coupled Izhikevich neurons connected as in the Caenorhabditis elegans neural networks as well as in networks with similar randomness and small-worldness. We analyse the effectiveness and robustness of the inference process under different observations and collective dynamics, contrasting the results obtained from using membrane potentials and inter-spike interval time-series. We find that overall, small-worldness favours network inference and degree heterogeneity hinders it. In particular, success rates in C. elegans networks - that combine small-world properties with degree heterogeneity - are closer to success rates in Erdös-Rényi network models rather than those in Watts-Strogatz network models. These results are relevant to understand better the relationship between topological properties and function in different neural networks.
Project description:<h4>Background</h4>The skeleton of complex systems can be represented as networks where vertices represent entities, and edges represent the relations between these entities. Often it is impossible, or expensive, to determine the network structure by experimental validation of the binary interactions between every vertex pair. It is usually more practical to infer the network from surrogate observations. Network inference is the process by which an underlying network of relations between entities is determined from indirect evidence. While many algorithms have been developed to infer networks from quantitative data, less attention has been paid to methods which infer networks from repeated co-occurrence of entities in related sets. This type of data is ubiquitous in the field of systems biology and in other areas of complex systems research. Hence, such methods would be of great utility and value.<h4>Results</h4>Here we present a general method for network inference from repeated observations of sets of related entities. Given experimental observations of such sets, we infer the underlying network connecting these entities by generating an ensemble of networks consistent with the data. The frequency of occurrence of a given link throughout this ensemble is interpreted as the probability that the link is present in the underlying real network conditioned on the data. Exponential random graphs are used to generate and sample the ensemble of consistent networks, and we take an algorithmic approach to numerically execute the inference method. The effectiveness of the method is demonstrated on synthetic data before employing this inference approach to problems in systems biology and systems pharmacology, as well as to construct a co-authorship collaboration network. We predict direct protein-protein interactions from high-throughput mass-spectrometry proteomics, integrate data from Chip-seq and loss-of-function/gain-of-function followed by expression data to infer a network of associations between pluripotency regulators, extract a network that connects 53 cancer drugs to each other and to 34 severe adverse events by mining the FDA's Adverse Events Reporting Systems (AERS), and construct a co-authorship network that connects Mount Sinai School of Medicine investigators. The predicted networks and online software to create networks from entity-set libraries are provided online at http://www.maayanlab.net/S2N.<h4>Conclusions</h4>The network inference method presented here can be applied to resolve different types of networks in current systems biology and systems pharmacology as well as in other fields of research.
Project description:Accurate inference of regulatory networks from experimental data facilitates the rapid characterization and understanding of biological systems. High-throughput technologies can provide a wealth of time-series data to better interrogate the complex regulatory dynamics inherent to organisms, but many network inference strategies do not effectively use temporal information. We address this limitation by introducing Sliding Window Inference for Network Generation (SWING), a generalized framework that incorporates multivariate Granger causality to infer network structure from time-series data. SWING moves beyond existing Granger methods by generating windowed models that simultaneously evaluate multiple upstream regulators at several potential time delays. We demonstrate that SWING elucidates network structure with greater accuracy in both in silico and experimentally validated in vitro systems. We estimate the apparent time delays present in each system and demonstrate that SWING infers time-delayed, gene-gene interactions that are distinct from baseline methods. By providing a temporal framework to infer the underlying directed network topology, SWING generates testable hypotheses for gene-gene influences.
Project description:<h4>Motivation</h4>Functional protein-protein interaction (PPI) networks elucidate molecular pathways underlying complex phenotypes, including those of human diseases. Extrapolation of domain-domain interactions (DDIs) from known PPIs is a major domain-based method for inferring functional PPI networks. However, the protein domain is a functional unit of the protein. Therefore, we should be able to effectively infer functional interactions between proteins based on the co-occurrence of domains.<h4>Results</h4>Here, we present a method for inferring accurate functional PPIs based on the similarity of domain composition between proteins by weighted mutual information (MI) that assigned different weights to the domains based on their genome-wide frequencies. Weighted MI outperforms other domain-based network inference methods and is highly predictive for pathways as well as phenotypes. A genome-scale human functional network determined by our method reveals numerous communities that are significantly associated with known pathways and diseases. Domain-based functional networks may, therefore, have potential applications in mapping domain-to-pathway or domain-to-phenotype associations.<h4>Availability and implementation</h4>Source code for calculating weighted mutual information based on the domain profile matrix is available from www.netbiolab.org/w/WMI CONTACT: Insuklee@yonsei.ac.kr<h4>Supplementary information</h4>Supplementary data are available at Bioinformatics online.
Project description:Biological systems contain a large number of molecules that have diverse interactions. A fruitful path to understanding these systems is to represent them with interaction networks, and then describe flow processes in the network with a dynamic model. Boolean modeling, the simplest discrete dynamic modeling framework for biological networks, has proven its value in recapitulating experimental results and making predictions. A first step and major roadblock to the widespread use of Boolean networks in biology is the laborious network inference and construction process. Here we present a streamlined network inference method that combines the discovery of a parsimonious network structure and the identification of Boolean functions that determine the dynamics of the system. This inference method is based on a causal logic analysis method that associates a logic type (sufficient or necessary) to node-pair relationships (whether promoting or inhibitory). We use the causal logic framework to assimilate indirect information obtained from perturbation experiments and infer relationships that have not yet been documented experimentally. We apply this inference method to a well-studied process of hormone signaling in plants, the signaling underlying abscisic acid (ABA)—induced stomatal closure. Applying the causal logic inference method significantly reduces the manual work typically required for network and Boolean model construction. The inferred model agrees with the manually curated model. We also test this method by re-inferring a network representing epithelial to mesenchymal transition based on a subset of the information that was initially used to construct the model. We find that the inference method performs well for various likely scenarios of inference input information. We conclude that our method is an effective approach toward inference of biological networks and can become an efficient step in the iterative process between experiments and computations.
Project description:Reverse engineering metabolome data to infer metabolic interactions is a challenging research topic. Here we introduce JacLy, a Jacobian-based method to infer metabolic interactions of small networks (<20 metabolites) from the covariance of steady-state metabolome data. The approach was applied to two different in silico small-scale metabolome datasets. The power of JacLy lies on the use of steady-state metabolome data to predict the Jacobian matrix of the system, which is a source of information on structure and dynamic characteristics of the system. Besides its advantage of inferring directed interactions, its superiority over correlation-based network inference was especially clear in terms of the required number of replicates and the effect of the use of priori knowledge in the inference. Additionally, we showed the use of standard deviation of the replicate data as a suitable approximation for the magnitudes of metabolite fluctuations inherent in the system.
Project description:Phylodynamic models are widely used in infectious disease epidemiology to infer the dynamics and structure of pathogen populations. However, these models generally assume that individual hosts contact one another at random, ignoring the fact that many pathogens spread through highly structured contact networks. We present a new framework for phylodynamics on local contact networks based on pairwise epidemiological models that track the status of pairs of nodes in the network rather than just individuals. Shifting our focus from individuals to pairs leads naturally to coalescent models that describe how lineages move through networks and the rate at which lineages coalesce. These pairwise coalescent models not only consider how network structure directly shapes pathogen phylogenies, but also how the relationship between phylogenies and contact networks changes depending on epidemic dynamics and the fraction of infected hosts sampled. By considering pathogen phylogenies in a probabilistic framework, these coalescent models can also be used to estimate the statistical properties of contact networks directly from phylogenies using likelihood-based inference. We use this framework to explore how much information phylogenies retain about the underlying structure of contact networks and to infer the structure of a sexual contact network underlying a large HIV-1 sub-epidemic in Switzerland.
Project description:<h4>Background</h4>The inference of a genetic network is a problem in which mutual interactions among genes are deduced using time-series of gene expression patterns. While a number of models have been proposed to describe genetic regulatory networks, this study focuses on a set of differential equations since it has the ability to model dynamic behavior of gene expression. When we use a set of differential equations to describe genetic networks, the inference problem can be defined as a function approximation problem. On the basis of this problem definition, we propose in this study a new method to infer reduced NGnet models of genetic networks.<h4>Results</h4>Through numerical experiments on artificial genetic network inference problems, we demonstrated that our method has the ability to infer genetic networks correctly and it was faster than the other inference methods. We then applied the proposed method to actual expression data of the bacterial SOS DNA repair system, and succeeded in finding several reasonable regulations. When our method inferred the genetic network from the actual data, it required about 4.7 min on a single-CPU personal computer.<h4>Conclusion</h4>The proposed method has an ability to obtain reasonable networks with a short computational time. As a high performance computer is not always available at every laboratory, the short computational time of our method is a preferable feature. There does not seem to be a perfect model for the inference of genetic networks yet. Therefore, in order to extract reliable information from the observed gene expression data, we should infer genetic networks using multiple inference methods based on different models. Our approach could be used as one of the promising inference methods.
Project description:BACKGROUND: Network inference is an important tool to reveal the underlying interactions of biological systems. In the liver, a complex system of transcription factors is active to distribute signals and induce the cellular response following extracellular stimuli. Plenty of information is available about single transcription factors important for the different functions of the liver, but little is known about their causal relations to each other. RESULTS: Given a DNA microarray time series dataset of collagen monolayers cultured murine hepatocytes, we identified 22 differentially expressed genes for which the corresponding protein is known to exhibit transcription factor activity. We developed the Extended TILAR (ExTILAR) network inference algorithm based on the modeling concept of the previously published TILAR algorithm. Using ExTILAR, we inferred a transcription factor network based on gene expression data which puts these important genes into a functional context. This way, we identified a previously unknown relationship between Tgif1 and Atf3 which we validated experimentally. Beside its known role in metabolic processes, this extends the knowledge about Tgif1 in hepatocytes towards a possible influence of processes such as proliferation and cell cycle. Moreover, two positive (i.e. double negative) regulatory loops were predicted that could give rise to bistable behavior. We further evaluated the performance of ExTILAR by systematic inference of an in silico network. CONCLUSIONS: We present the ExTILAR algorithm, which combines the advantages of the regression based inference algorithm TILAR, like large network sizes processable and low computational costs, with the advantages of dynamic network models based on ordinary differential equation (i.e. in silico knock-down simulations). Like TILAR, ExTILAR makes use of various prior-knowledge types such as transcription factor binding site information and gene interaction knowledge to infer biologically meaningful gene regulatory networks. Therefore, ExTILAR is especially useful when a large number of genes is modeled using a small number of experimental data points.