Dataset Information

Multi-omic modelling of inflammatory bowel disease with regularized canonical correlation analysis.

ABSTRACT:

Background

Personalized medicine requires finding relationships between variables that influence a patient's phenotype and predicting an outcome. Sparse generalized canonical correlation analysis identifies relationships between different groups of variables. This method requires establishing a model of the expected interaction between those variables. Describing these interactions is challenging when the relationship is unknown or when there is no pre-established hypothesis. Thus, our aim was to develop a method to find the relationships between microbiome and host transcriptome data and the relevant clinical variables in a complex disease, such as Crohn's disease.

Results

We present here a method to identify interactions based on canonical correlation analysis. We show that the model is the most important factor to identify relationships between blocks using a dataset of Crohn's disease patients with longitudinal sampling. First the analysis was tested in two previously published datasets: a glioma and a Crohn's disease and ulcerative colitis dataset where we describe how to select the optimum parameters. Using such parameters, we analyzed our Crohn's disease data set. We selected the model with the highest inner average variance explained to identify relationships between transcriptome, gut microbiome and clinically relevant variables. Adding the clinically relevant variables improved the average variance explained by the model compared to multiple co-inertia analysis.

Conclusions

The methodology described herein provides a general framework for identifying interactions between sets of omic data and clinically relevant variables. Following this method, we found genes and microorganisms that were related to each other independently of the model, while others were specific to the model used. Thus, model selection proved crucial to finding the existing relationships in multi-omics datasets.

SUBMITTER: Revilla L

PROVIDER: S-EPMC7870068 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Multi-omic modelling of inflammatory bowel disease with regularized canonical correlation analysis.

Revilla Lluís L Mayorgas Aida A Corraliza Ana M AM Masamunt Maria C MC Metwaly Amira A Haller Dirk D Tristán Eva E Carrasco Anna A Esteve Maria M Panés Julian J Ricart Elena E Lozano Juan J JJ Salas Azucena A

PloS one 20210208 2

<h4>Background</h4>Personalized medicine requires finding relationships between variables that influence a patient's phenotype and predicting an outcome. Sparse generalized canonical correlation analysis identifies relationships between different groups of variables. This method requires establishing a model of the expected interaction between those variables. Describing these interactions is challenging when the relationship is unknown or when there is no pre-established hypothesis. Thus, our a ...[more]

PMID: 33556098

Similar Datasets

Project description:Identifying genetic risk factors for Alzheimer's disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case-control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of cross-endophenotype (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, i.e., pairwise endophenotype correlation-guided MTSCCA (pcMTSCCA) and high-order endophenotype correlation-guided MTSCCA (hocMTSCCA). pcMTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. hocMTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared pcMTSCCA and hocMTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using multi-omic endophenotypes and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at https://ngdc.cncb.ac.cn/biocode/tools/BT007330.

Project description:Inflammatory bowel disease (IBD) is characterized by complex etiology and a disrupted colonic ecosystem. We provide a framework for the analysis of multi-omic data, which we apply to study the gut ecosystem in IBD. Specifically, we train and validate models using data on the metagenome, metatranscriptome, virome, and metabolome from the Human Microbiome Project 2 IBD multi-omic database, with 1,785 repeated samples from 130 individuals (103 cases and 27 controls). After splitting the participants into training and testing groups, we used mixed-effects least absolute shrinkage and selection operator regression to select features for each omic. These features, with demographic covariates, were used to generate separate single-omic prediction scores. All four single-omic scores were then combined into a final regression to assess the relative importance of the individual omics and the predictive benefits when considered together. We identified several species, pathways, and metabolites known to be associated with IBD risk, and we explored the connections between data sets. Individually, metabolomic and viromic scores were more predictive than metagenomics or metatranscriptomics, and when all four scores were combined, we predicted disease diagnosis with a Nagelkerke's R2 of 0.46 and an area under the curve of 0.80 (95% confidence interval: 0.63, 0.98). Our work supports that some single-omic models for complex traits are more predictive than others, that incorporating multiple omic data sets may improve prediction, and that each omic data type provides a combination of unique and redundant information. This modeling framework can be extended to other complex traits and multi-omic data sets.IMPORTANCEComplex traits are characterized by many biological and environmental factors, such that multi-omic data sets are well-positioned to help us understand their underlying etiologies. We applied a prediction framework across multiple omics (metagenomics, metatranscriptomics, metabolomics, and viromics) from the gut ecosystem to predict inflammatory bowel disease (IBD) diagnosis. The predicted scores from our models highlighted key features and allowed us to compare the relative utility of each omic data set in single-omic versus multi-omic models. Our results emphasized the importance of metabolomics and viromics over metagenomics and metatranscriptomics for predicting IBD status. The greater predictive capability of metabolomics and viromics is likely because these omics serve as markers of lifestyle factors such as diet. This study provides a modeling framework for multi-omic data, and our results show the utility of combining multiple omic data types to disentangle complex disease etiologies and biological signatures.

Project description:BackgroundMitochondrial dysfunction has been linked to the development of inflammatory bowel disease (IBD), but the genetic pathophysiology was not fully elucidated. We employed Mendelian randomization and colocalization analyses to investigate the associations between mitochondrial-related genes and IBD via integrating multi-omics.MethodsSummary-level data of mitochondrial gene methylation, expression and protein abundance levels were obtained from corresponding methylation, expression and protein quantitative trait loci studies, respectively. We obtained genetic associations with IBD and its two subtypes from the Inflammatory Bowel Disease Genetics Consortium (discovery), the UK Biobank (replication), and the FinnGen study (replication). We performed summary-data-based Mendelian randomization analysis to assess the associations of mitochondrial gene-related molecular features with IBD. Colocalization analysis was further conducted to assess whether the identified signal pairs shared a causal genetic variant.FindingsAfter integrating the multi-omics data between mQTL-eQTL and eQTL-pQTL, we identified two mitochondrial genes, i.e., PARK7 and ACADM, with tier 1 evidence for their associations with IBD and ulcerative colitis (UC). PDK1 and FISI genes were associated with UC risk with tier 2 and tier 3 evidence, respectively. The methylation of cg05467918 in ACADM was associated with lower expression of ACADM, which fits with the positive effect of cg05467918 methylation on UC risk. Consistently, the inverse associations between gene methylation and gene expression were also observed in PARK7 (cg10385390) and PDK1 (cg17679246), which were corroborated with the protective role in UC. At circulating protein level, genetically predicted higher levels of PARK7 (OR 0.36, 95% CI 0.25-0.52) and HINT1 (OR 0.47, 95% CI 0.30-0.74) were inversely associated with IBD risk; genetically predicted higher level of HINT1 was associated with a decreased risk of Crohn's disease (CD) (OR 0.26, 95% CI 0.14-0.49) and a higher level of ACADM (OR 0.67, 95% CI 0.55-0.83), PDK1 (OR 0.63, 95% CI 0.49-0.81), FIS1 (OR 0.63, 95% CI 0.47-0.83) was associated with a decreased risk of UC.InterpretationWe found that the mitochondrial PARK7 gene was putatively associated with IBD risk, and mitochondrial FIS1, PDK1, and ACADM genes were associated with UC risk with evidence from multi-omics levels. This study identified mitochondrial genes in relation to IBD, which may enhance the understanding of the pathogenic mechanisms of IBD development.FundingXL is supported by the Natural Science Fund for Distinguished Young Scholars of Zhejiang Province (LR22H260001) and Healthy Zhejiang One Million People Cohort (K-20230085).

Project description:Inflammatory bowel diseases (IBDs), including ulcerative colitis and Crohn's disease, affect several million individuals worldwide. These diseases are heterogeneous at the clinical, immunological and genetic levels and result from complex host and environmental interactions. Investigating drug efficacy for IBD can improve our understanding of why treatment response can vary between patients. We propose an explainable machine learning (ML) approach that combines bioinformatics and domain insight, to integrate multi-modal data and predict inter-patient variation in drug response. Using explanation of our models, we interpret the ML models' predictions to infer unique combinations of important features associated with pharmacological responses obtained during preclinical testing of drug candidates in ex vivo patient-derived fresh tissues. Our inferred multi-modal features that are predictive of drug efficacy include multi-omic data (genomic and transcriptomic), demographic, medicinal and pharmacological data. Our aim is to understand variation in patient responses before a drug candidate moves forward to clinical trials. As a pharmacological measure of drug efficacy, we measured the reduction in the release of the inflammatory cytokine TNFα from the fresh IBD tissues in the presence/absence of test drugs. We initially explored the effects of a mitogen-activated protein kinase (MAPK) inhibitor; however, we later showed our approach can be applied to other targets, test drugs or mechanisms of interest. Our best model predicted TNFα levels from demographic, medicinal and genomic features with an error of only 4.98% on unseen patients. We incorporated transcriptomic data to validate insights from genomic features. Our results showed variations in drug effectiveness (measured by ex vivo assays) between patients that differed in gender, age or condition and linked new genetic polymorphisms to patient response variation to the anti-inflammatory treatment BIRB796 (Doramapimod). Our approach models IBD drug response while also identifying its most predictive features as part of a transparent ML precision medicine strategy.

Project description:AimTo study the association between inflammatory bowel disease (IBD) and genetic variations in eosinophil protein X (EPX) and eosinophil cationic protein (ECP).MethodsDNA was extracted from ethylene diamine tetraacetic acid blood of 587 patients with Crohn's disease (CD), 592 with ulcerative colitis (UC) and 300 healthy subjects. The EPX405 (G > C, rs2013109), ECP434 (G > C, rs2073342) and ECP562 (G > C, rs2233860) gene polymorphisms were analysed, by the 5'-nuclease allelic discrimination assay. For determination of intracellular content of EPX and ECP in granulocytes, 39 blood samples was collected and extracted with a buffer containing cetyltrimethylammonium bromide. The intracellular content of EPX was analysed using an enzyme-linked immunosorbent assay. The intracellular content of ECP was analysed with the UniCAP(®) system as described by the manufacturer. Statistical tests for calculations of results were χ(2) test, Fisher's exact test, ANOVA, Student-Newman-Keuls test, and Kaplan-Meier survival curve with Log-rank test for trend, the probability values of P < 0.05 were considered statistically significant.ResultsThe genotype frequency for males with UC and with an age of disease onset of ≥ 45 years (n = 57) was for ECP434 and ECP562, GG = 37%, GC = 60%, CC = 4% and GG = 51%, GC = 49%, CC = 0% respectively. This was significantly different from the healthy subject's genotype frequencies of ECP434 (GG = 57%, GC = 38%, CC = 5%; P = 0.010) and ECP562 (GG = 68%, GC = 29%,CC = 3%; P = 0.009). The genotype frequencies for females, with an age of disease onset of ≥ 45 years with CD (n = 62), was for the ECP434 and ECP562 genotypes GG = 37%, GC = 52%, CC = 11% and GG = 48%, GC = 47% and CC = 5% respectively. This was also statistically different from healthy controls for both ECP434 (P = 0.010) and ECP562 (P = 0.013). The intracellular protein concentration of EPX and ECP was calculated in μg/10(6) eosinophils and then correlated to the EPX 405 genotypes. The protein content of EPX was highest in the patients with the CC genotype of EPX405 (GG = 4.65, GC = 5.93, and CC = 6.57) and for ECP in the patients with the GG genotype of EPX405 (GG = 2.70, GC = 2.47 and CC = 1.90). ANOVA test demonstrated a difference in intracellular protein content for EPX (P = 0.009) and ECP (P = 0.022). The age of disease onset was linked to haplotypes of the EPX405, ECP434 and ECP562 genotypes. Kaplan Maier curve showed a difference between haplotype distributions for the females with CD (P = 0.003). The highest age of disease onset was seen in females with the EPX405CC, ECP434GC, ECP562CC haplotype (34 years) and the lowest in females with the EPX405GC, ECP434GC, ECP562GG haplotype (21 years). For males with UC there was also a difference between the highest and lowest age of the disease onset (EPX405CC, ECP434CC, ECP562CC, mean 24 years vs EPX405GC, ECP434GC, ECP562GG, mean 34 years, P = 0.0009). The relative risk for UC patients with ECP434 or ECP562-GC/CC genotypes to develop dysplasia/cancer was 2.5 (95%CI: 1.2-5.4, P = 0.01) and 2.5 (95%CI: 1.1-5.4, P = 0.02) respectively, compared to patients carrying the GG-genotypes.ConclusionPolymorphisms of EPX and ECP are associated to IBD in an age and gender dependent manner, suggesting an essential role of eosinophils in the pathophysiology of IBD.

Project description:BackgroundInflammation is a core element of many different, systemic and chronic diseases that usually involve an important autoimmune component. The clinical phase of inflammatory diseases is often the culmination of a long series of pathologic events that started years before. The systemic characteristics and related mechanisms could be investigated through the multi-omic comparative analysis of many inflammatory diseases. Therefore, it is important to use molecular data to study the genesis of the diseases. Here we propose a new methodology to study the relationships between inflammatory diseases and signalling molecules whose dysregulation at molecular levels could lead to systemic pathological events observed in inflammatory diseases.ResultsWe first perform an exploratory analysis of gene expression data of a number of diseases that involve a strong inflammatory component. The comparison of gene expression between disease and healthy samples reveals the importance of members of gene families coding for signalling factors. Next, we focus on interested signalling gene families and a subset of inflammation related diseases with multi-omic features including both gene expression and DNA methylation. We introduce a phylogenetic-based multi-omic method to study the relationships between multi-omic features of inflammation related diseases by integrating gene expression, DNA methylation through sequence based phylogeny of the signalling gene families. The models of adaptations between gene expression and DNA methylation can be inferred from pre-estimated evolutionary relationship of a gene family. Members of the gene family whose expression or methylation levels significantly deviate from the model are considered as the potential disease associated genes.ConclusionsApplying the methodology to four gene families (the chemokine receptor family, the TNF receptor family, the TGF- β gene family, the IL-17 gene family) in nine inflammation related diseases, we identify disease associated genes which exhibit significant dysregulation in gene expression or DNA methylation in the inflammation related diseases, which provides clues for functional associations between the diseases.

Dataset Information

Multi-omic modelling of inflammatory bowel disease with regularized canonical correlation analysis.

Background

Results

Conclusions

Publications

Multi-omic modelling of inflammatory bowel disease with regularized canonical correlation analysis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets