Project description:Making effective use of multiple data sources is a major challenge in modern bioinformatics. Genome-wide data such as measures of transcription factor binding, gene expression, and sequence conservation, which are used to identify binding regions and genes that are important to major biological processes such as development and disease, can be difficult to use together due to the different biological meanings and statistical distributions of the heterogeneous data types, but each can provide valuable information for understanding the processes under study. Here we present methods for integrating multiple data sources to gain a more complete picture of gene regulation and expression. Our goal is to identify genes and cis-regulatory regions which play specific biological roles. We describe a graphical mixture model approach for data integration, examine the effect of using different model topologies, and discuss methods for evaluating the effectiveness of the models. Model fitting is computationally efficient and produces results which have clear biological and statistical interpretations. The Hedgehog and Dorsal signaling pathways in Drosophila, which are critical in embryonic development, are used as examples.
Project description:BackgroundSeveral methods have been developed for analyzing genome-scale models of metabolism and transcriptional regulation. Many of these methods, such as Flux Balance Analysis, use constrained optimization to predict relationships between metabolic flux and the genes that encode and regulate enzyme activity. Recently, mixed integer programming has been used to encode these gene-protein-reaction (GPR) relationships into a single optimization problem, but these techniques are often of limited generality and lack a tool for automating the conversion of rules to a coupled regulatory/metabolic model.ResultsWe present TIGER, a Toolbox for Integrating Genome-scale Metabolism, Expression, and Regulation. TIGER converts a series of generalized, Boolean or multilevel rules into a set of mixed integer inequalities. The package also includes implementations of existing algorithms to integrate high-throughput expression data with genome-scale models of metabolism and transcriptional regulation. We demonstrate how TIGER automates the coupling of a genome-scale metabolic model with GPR logic and models of transcriptional regulation, thereby serving as a platform for algorithm development and large-scale metabolic analysis. Additionally, we demonstrate how TIGER's algorithms can be used to identify inconsistencies and improve existing models of transcriptional regulation with examples from the reconstructed transcriptional regulatory network of Saccharomyces cerevisiae.ConclusionThe TIGER package provides a consistent platform for algorithm development and extending existing genome-scale metabolic models with regulatory networks and high-throughput data.
Project description:BackgroundThe study of cellular metabolism in the context of high-throughput -omics data has allowed us to decipher novel mechanisms of importance in biotechnology and health. To continue with this progress, it is essential to efficiently integrate experimental data into metabolic modeling.ResultsWe present here an in-silico framework to infer relevant metabolic pathways for a particular phenotype under study based on its gene/protein expression data. This framework is based on the Carbon Flux Path (CFP) approach, a mixed-integer linear program that expands classical path finding techniques by considering additional biophysical constraints. In particular, the objective function of the CFP approach is amended to account for gene/protein expression data and influence obtained paths. This approach is termed integrative Carbon Flux Path (iCFP). We show that gene/protein expression data also influences the stoichiometric balancing of CFPs, which provides a more accurate picture of active metabolic pathways. This is illustrated in both a theoretical and real scenario. Finally, we apply this approach to find novel pathways relevant in the regulation of acetate overflow metabolism in Escherichia coli. As a result, several targets which could be relevant for better understanding of the phenomenon leading to impaired acetate overflow are proposed.ConclusionsA novel mathematical framework that determines functional pathways based on gene/protein expression data is presented and validated. We show that our approach is able to provide new insights into complex biological scenarios such as acetate overflow in Escherichia coli.
Project description:BackgroundHigh throughput molecular-interaction studies using immunoprecipitations (IP) or affinity purifications are powerful and widely used in biology research. One of many important applications of this method is to identify the set of RNAs that interact with a particular RNA-binding protein (RBP). Here, the unique statistical challenge presented is to delineate a specific set of RNAs that are enriched in one sample relative to another, typically a specific IP compared to a non-specific control to model background. The choice of normalization procedure critically impacts the number of RNAs that will be identified as interacting with an RBP at a given significance threshold - yet existing normalization methods make assumptions that are often fundamentally inaccurate when applied to IP enrichment data.MethodsIn this paper, we present a new normalization methodology that is specifically designed for identifying enriched RNA or DNA sequences in an IP. The normalization (called adaptive or AD normalization) uses a basic model of the IP experiment and is not a variant of mean, quantile, or other methodology previously proposed. The approach is evaluated statistically and tested with simulated and empirical data.Results and conclusionsThe adaptive (AD) normalization method results in a greatly increased range in the number of enriched RNAs identified, fewer false positives, and overall better concordance with independent biological evidence, for the RBPs we analyzed, compared to median normalization. The approach is also applicable to the study of pairwise RNA, DNA and protein interactions such as the analysis of transcription factors via chromatin immunoprecipitation (ChIP) or any other experiments where samples from two conditions, one of which contains an enriched subset of the other, are studied.
Project description:Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) poses a mortal threat to human health. The elucidation of the relationship between peripheral immune cells and the development of inflammation is essential for revealing the pathogenic mechanism of COVID-19 and developing related antiviral drugs. The immune cell metabolism-targeting therapies exhibit a desirable anti-inflammatory effect in some treatment cases. In this study, based on differentially expressed gene (DEG) analysis, a genome-scale metabolic model (GSMM) was reconstructed by integrating transcriptome data to characterize the adaptive metabolic changes in peripheral blood mononuclear cells (PBMCs) in severe COVID-19 patients. Differential flux analysis revealed that metabolic changes such as enhanced aerobic glycolysis, impaired oxidative phosphorylation, fluctuating biogenesis of lipids, vitamins (folate and retinol), and nucleotides played important roles in the inflammation adaptation of PBMCs. Moreover, the main metabolic enzymes such as the solute carrier (SLC) family 2 member 3 (SLC2A3) and fatty acid synthase (FASN), responsible for the reactions with large differential fluxes, were identified as potential therapeutic targets. Our results revealed the inflammation regulation potentials of partial metabolic reactions with differential fluxes and their metabolites. This study provides a reference for developing potential PBMC metabolism-targeting therapy strategies against COVID-19.
Project description:Understanding the adaptive responses of individual bacterial strains is crucial for microbiome engineering approaches that introduce new functionalities into complex microbiomes, such as xenobiotic compound metabolism for soil bioremediation. Adaptation requires metabolic reprogramming of the cell, which can be captured by multi-omics, but this data remains formidably challenging to interpret and predict. Here we present a new approach that combines genome-scale metabolic modeling with transcriptomics and exometabolomics, both of which are common tools for studying dynamic population behavior. As a realistic demonstration, we developed a genome-scale model of Pseudomonas veronii 1YdBTEX2, a candidate bioaugmentation agent for accelerated metabolism of mono-aromatic compounds in soil microbiomes, while simultaneously collecting experimental data of P. veronii metabolism during growth phase transitions. Predictions of the P. veronii growth rates and specific metabolic processes from the integrated model closely matched experimental observations. We conclude that integrative and network-based analysis can help build predictive models that accurately capture bacterial adaptation responses. Further development and testing of such models may considerably improve the successful establishment of bacterial inoculants in more complex systems.
Project description:The Escherichia coli genome-scale metabolic model (GEM) is an exemplar systems biology model for the simulation of cellular metabolism. Experimental validation of model predictions is essential to pinpoint uncertainty and ensure continued development of accurate models. Here, we quantified the accuracy of four subsequent E. coli GEMs using published mutant fitness data across thousands of genes and 25 different carbon sources. This evaluation demonstrated the utility of the area under a precision-recall curve relative to alternative accuracy metrics. An analysis of errors in the latest (iML1515) model identified several vitamins/cofactors that are likely available to mutants despite being absent from the experimental growth medium and highlighted isoenzyme gene-protein-reaction mapping as a key source of inaccurate predictions. A machine learning approach further identified metabolic fluxes through hydrogen ion exchange and specific central metabolism branch points as important determinants of model accuracy. This work outlines improved practices for the assessment of GEM accuracy with high-throughput mutant fitness data and highlights promising areas for future model refinement in E. coli and beyond.
Project description:We evaluated the presence/absence of proteins encoded by 14?077 genes in adipocytes obtained from different tissue samples using immunohistochemistry. By combining this with previously published adipocyte-specific proteome data, we identified proteins associated with 7340 genes in human adipocytes. This information was used to reconstruct a comprehensive and functional genome-scale metabolic model of adipocyte metabolism. The resulting metabolic model, iAdipocytes1809, enables mechanistic insights into adipocyte metabolism on a genome-wide level, and can serve as a scaffold for integration of omics data to understand the genotype-phenotype relationship in obese subjects. By integrating human transcriptome and fluxome data, we found an increase in the metabolic activity around androsterone, ganglioside GM2 and degradation products of heparan sulfate and keratan sulfate, and a decrease in mitochondrial metabolic activities in obese subjects compared with lean subjects. Our study hereby shows a path to identify new therapeutic targets for treating obesity through combination of high throughput patient data and metabolic modeling.
Project description:Caenorhabditis elegans is a powerful model to study metabolism and how it relates to nutrition, gene expression, and life history traits. However, while numerous experimental techniques that enable perturbation of its diet and gene function are available, a high-quality metabolic network model has been lacking. Here, we reconstruct an initial version of the C. elegans metabolic network. This network model contains 1,273 genes, 623 enzymes, and 1,985 metabolic reactions and is referred to as iCEL1273. Using flux balance analysis, we show that iCEL1273 is capable of representing the conversion of bacterial biomass into C. elegans biomass during growth and enables the predictions of gene essentiality and other phenotypes. In addition, we demonstrate that gene expression data can be integrated with the model by comparing metabolic rewiring in dauer animals versus growing larvae. iCEL1273 is available at a dedicated website (wormflux.umassmed.edu) and will enable the unraveling of the mechanisms by which different macro- and micronutrients contribute to the animal's physiology.
Project description:Genome-scale metabolic models are widely used to enhance our understanding of metabolic features of organisms, host-pathogen interactions and to identify therapeutics for diseases. Here we present iTMU798, the genome-scale metabolic model of the mouse whipworm Trichuris muris. The model demonstrates the metabolic features of T. muris and allows the prediction of metabolic steps essential for its survival. Specifically, that Thioredoxin Reductase (TrxR) enzyme is essential, a prediction we validate in vitro with the drug auranofin. Furthermore, our observation that the T. muris genome lacks gsr-1 encoding Glutathione Reductase (GR) but has GR activity that can be inhibited by auranofin indicates a mechanism for the reduction of glutathione by the TrxR enzyme in T. muris. In addition, iTMU798 predicts seven essential amino acids that cannot be synthesised by T. muris, a prediction we validate for the amino acid tryptophan. Overall, iTMU798 is as a powerful tool to study not only the T. muris metabolism but also other Trichuris spp. in understanding host parasite interactions and the rationale design of new intervention strategies.