Project description:Background Cardiometabolic diseases are highly comorbid, but their relationship with female-specific or overwhelmingly female-predominant health conditions (breast cancer, endometriosis, pregnancy complications) is understudied. This study aimed to estimate the cross-trait genetic overlap and influence of genetic burden of cardiometabolic traits on health conditions unique to women. Methods and Results Using electronic health record data from 71 008 ancestrally diverse women, we examined relationships between 23 obstetrical/gynecological conditions and 4 cardiometabolic phenotypes (body mass index, coronary artery disease, type 2 diabetes, and hypertension) by performing 4 analyses: (1) cross-trait genetic correlation analyses to compare genetic architecture, (2) polygenic risk score-based association tests to characterize shared genetic effects on disease risk, (3) Mendelian randomization for significant associations to assess cross-trait causal relationships, and (4) chronology analyses to visualize the timeline of events unique to groups of women with high and low genetic burden for cardiometabolic traits and highlight the disease prevalence in risk groups by age. We observed 27 significant associations between cardiometabolic polygenic scores and obstetrical/gynecological conditions (body mass index and endometrial cancer, body mass index and polycystic ovarian syndrome, type 2 diabetes and gestational diabetes, type 2 diabetes and polycystic ovarian syndrome). Mendelian randomization analysis provided additional evidence of independent causal effects. We also identified an inverse association between coronary artery disease and breast cancer. High cardiometabolic polygenic scores were associated with early development of polycystic ovarian syndrome and gestational hypertension. Conclusions We conclude that polygenic susceptibility to cardiometabolic traits is associated with elevated risk of certain female-specific health conditions.
Project description:BackgroundThere has been intense effort over the past couple of decades to identify loci underlying quantitative traits as a key step in the process of elucidating the etiology of complex diseases. Recently there has been some effort to coalesce non-biased high-throughput data, e.g. high density genotyping and genome wide RNA expression, to drive understanding of the molecular basis of disease. However, a stumbling block has been the difficult question of how to leverage this information to identify molecular mechanisms that explain quantitative trait loci (QTL). We have developed a formal statistical hypothesis test, resulting in a p-value, to quantify uncertainty in a causal inference pertaining to a measured factor, e.g. a molecular species, which potentially mediates a known causal association between a locus and a quantitative trait.ResultsWe treat the causal inference as a 'chain' of mathematical conditions that must be satisfied to conclude that the potential mediator is causal for the trait, where the inference is only as good as the weakest link in the chain. P-values are computed for the component conditions, which include tests of linkage and conditional independence. The Intersection-Union Test, in which a series of statistical tests are combined to form an omnibus test, is then employed to generate the overall test result. Using computer simulated mouse crosses, we show that type I error is low under a variety of conditions that include hidden variables and reactive pathways. We show that power under a simple causal model is comparable to other model selection techniques as well as Bayesian network reconstruction methods. We further show empirically that this method compares favorably to Bayesian network reconstruction methods for reconstructing transcriptional regulatory networks in yeast, recovering 7 out of 8 experimentally validated regulators.ConclusionHere we propose a novel statistical framework in which existing notions of causal mediation are formalized into a hypothesis test, thus providing a standard quantitative measure of uncertainty in the form of a p-value. The method is theoretically and computationally accessible and with the provided software may prove a useful tool in disentangling molecular relationships.
Project description:BackgroundThis study was to systematically test whether previously reported risk factors for chronic kidney disease (CKD) are causally related to CKD in European and East Asian ancestries using Mendelian randomization.MethodsA total of 45 risk factors with genetic data in European ancestry and 17 risk factors in East Asian participants were identified as exposures from PubMed. We defined the CKD by clinical diagnosis or by estimated glomerular filtration rate of <60 ml/min/1.73 m2. Ultimately, 51 672 CKD cases and 958 102 controls of European ancestry from CKDGen, UK Biobank and HUNT, and 13 093 CKD cases and 238 118 controls of East Asian ancestry from Biobank Japan, China Kadoorie Biobank and Japan-Kidney-Biobank/ToMMo were included.ResultsEight risk factors showed reliable evidence of causal effects on CKD in Europeans, including genetically predicted body mass index (BMI), hypertension, systolic blood pressure, high-density lipoprotein cholesterol, apolipoprotein A-I, lipoprotein(a), type 2 diabetes (T2D) and nephrolithiasis. In East Asians, BMI, T2D and nephrolithiasis showed evidence of causality on CKD. In two independent replication analyses, we observed that increased hypertension risk showed reliable evidence of a causal effect on increasing CKD risk in Europeans but in contrast showed a null effect in East Asians. Although liability to T2D showed consistent effects on CKD, the effects of glycaemic phenotypes on CKD were weak. Non-linear Mendelian randomization indicated a threshold relationship between genetically predicted BMI and CKD, with increased risk at BMI of >25 kg/m2.ConclusionsEight cardiometabolic risk factors showed causal effects on CKD in Europeans and three of them showed causality in East Asians, providing insights into the design of future interventions to reduce the burden of CKD.
Project description:Body mass index (BMI) is a complex disease risk factor known to be influenced by genes acting via both metabolic pathways and appetite regulation. In this study, we aimed to gain insight into the phenotypic consequences of BMI-associated genetic variants, which may be mediated by their expression in different tissues. First, we harnessed meta-analyzed gene expression datasets derived from subcutaneous adipose (n = 1257) and brain (n = 1194) tissue to identify 86 and 140 loci, respectively, which provided evidence of genetic colocalization with BMI. These two sets of tissue-partitioned loci had differential effects with respect to waist-to-hip ratio, suggesting that the way they influence fat distribution might vary despite their having very similar average magnitudes of effect on BMI itself (adipose = 0.0148 and brain = 0.0149 standard deviation change in BMI per effect allele). For instance, BMI-associated variants colocalized with TBX15 expression in adipose tissue (posterior probability [PPA] = 0.97), but not when we used TBX15 expression data derived from brain tissue (PPA = 0.04) This gene putatively influences BMI via its role in skeletal development. Conversely, there were loci where BMI-associated variants provided evidence of colocalization with gene expression in brain tissue (e.g., NEGR1, PPA = 0.93), but not when we used data derived from adipose tissue, suggesting that these genes might be more likely to influence BMI via energy balance. Leveraging these tissue-partitioned variant sets through a multivariable Mendelian randomization framework provided strong evidence that the brain-tissue-derived variants are predominantly responsible for driving the genetically predicted effects of BMI on cardiovascular-disease endpoints (e.g., coronary artery disease: odds ratio = 1.05, 95% confidence interval = 1.04-1.07, p = 4.67 × 10-14). In contrast, our analyses suggested that the adipose tissue variants might predominantly be responsible for the underlying relationship between BMI and measures of cardiac function, such as left ventricular stroke volume (beta = 0.21, 95% confidence interval = 0.09-0.32, p = 6.43 × 10-4).
Project description:Cardiometabolic diseases, such as type 2 diabetes and cardiovascular disease, have a high public health burden. Understanding the genetically determined regulation of proteins that are dysregulated in disease can help to dissect the complex biology underpinning them. Here, we perform a protein quantitative trait locus (pQTL) analysis of 248 serum proteins relevant to cardiometabolic processes in 2893 individuals. Meta-analyzing whole-genome sequencing (WGS) data from two Greek cohorts, MANOLIS (n = 1356; 22.5× WGS) and Pomak (n = 1537; 18.4× WGS), we detect 301 independently associated pQTL variants for 170 proteins, including 12 rare variants (minor allele frequency < 1%). We additionally find 15 pQTL variants that are rare in non-Finnish European populations but have drifted up in the frequency in the discovery cohorts here. We identify proteins causally associated with cardiometabolic traits, including Mep1b for high-density lipoprotein (HDL) levels, and describe a knock-out (KO) Mep1b mouse model. Our findings furnish insights into the genetic architecture of the serum proteome, identify new protein-disease relationships and demonstrate the importance of isolated populations in pQTL analysis.
Project description:The discovery of molecular relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model, under certain assumptions, estimates an average of a causal quantity reflecting the direct influence of one variable on another. We leverage this insight to propose a precise definition of a gene regulatory relationship and implement a new tool, CIMLA (Counterfactual Inference by Machine Learning and Attribution Models), to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Last, we use CIMLA to analyze a previously published single-cell RNA sequencing dataset from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD.
Project description:Amblyopia is a common visual disorder that causes significant vision problems globally. Most non-ocular risk factors for amblyopia are closely related to the intrauterine environment, and are strongly influenced by parent-origin effects. Parent-origin perinatal factors may have a direct causal inference on amblyopia development; therefore, we investigated the causal association between perinatal factors and amblyopia risk using a one-sample Mendelian Randomization (MR) with data from the UK Biobank Cohort Data (UKBB). Four distinct MR methods were employed to analyze the association between three perinatal factors (birth weight [BW], maternal smoking, and breastfeeding) and amblyopia risk, based on the summary statistics of genome-wide association studies in the European population. The inverse variance weighting method showed an inverse causal association between BW and amblyopia risk (odds ratio, 0.48 [95% CI, 0.29-0.80]; p = 0.004). Maternal smoking and breastfeeding were not causally associated with amblyopia risk. Our findings provided a possible evidence of a significant genetic causal association between low BW and increased amblyopia risk. This evidence may highlight the potential of BW as a predictive factor for visual maldevelopment and the need for careful management of amblyopia risk in patients with low BW.
Project description:BackgroundCorrelations between polymorphic markers and observed phenotypes provide the basis for mapping traits in quantitative genetics. When the phenotype is gene expression, then loci involved in regulatory control can theoretically be implicated. Recent efforts to construct gene regulatory networks from genotype and gene expression data have shown that biologically relevant networks can be achieved from an integrative approach. In this paper, we consider the problem of identifying individual pairs of genes in a direct or indirect, causal, trans-acting relationship.ResultsInspired by epistatic models of multi-locus quantitative trait (QTL) mapping, we propose a unified model of expression and genotype to identify quantitative trait genes (QTG) by extending the conventional linear model to include both genotype and expression of regulator genes and their interactions. The model provides mapping of specific genes in contrast to standard linkage approaches that implicate large QTL intervals typically containing tens of genes. In simulations, we found that the method can often detect weak trans-acting regulators amid the background noise of thousands of traits and is robust to transcription models containing multiple regulator genes. We reanalyze several pleiotropic loci derived from a large set of yeast matings and identify a likely alternative regulator not previously published. However, we also found that many regulators can not be so easily mapped due to the presence of cis-acting QTLs on the regulators, which induce close linkage among small neighborhoods of genes. QTG mapped regulator-target pairs linked to ARN1 were combined to form a regulatory module, which we observed to be highly enriched in iron homeostasis related genes and contained several causally directed links that had not been identified in other automatic reconstructions of that regulatory module. Finally, we also confirm the surprising, previously published results that regulators controlling gene expression are not enriched for transcription factors, but we do show that our more precise mapping model reveals functional enrichment for several other biological processes related to the regulation of the cell.ConclusionBy incorporating interacting expression and genotype, our QTG mapping method can identify specific regulator genes in contrast to standard QTL interval mapping. We have shown that the method can recover biologically significant regulator-target pairs and the approach leads to a general framework for inducing a regulatory module network topology of directed and undirected edges that can be used to identify leads in pathway analysis.
Project description:MotivationRecent advances in DNA sequencing technologies have allowed the detailed characterization of genomes in large cohorts of tumors, highlighting their extreme heterogeneity, with no two tumors sharing the same complement of somatic mutations. Such heterogeneity hinders our ability to identify somatic mutations important for the disease, including mutations that determine clinically relevant phenotypes (e.g., cancer subtypes). Several tools have been developed to identify somatic mutations related to cancer phenotypes. However, such tools identify correlations between somatic mutations and cancer phenotypes, with no guarantee of highlighting causal relations.ResultsWe describe ALLSTAR, a novel tool to infer reliable causal relations between somatic mutations and cancer phenotypes. ALLSTAR identifies reliable causal rules highlighting combinations of somatic mutations with the highest impact in terms of average effect on the phenotype. While we prove that the underlying computational problem is NP-hard, we develop a branch-and-bound approach that employs protein-protein interaction networks and novel bounds for pruning the search space, while properly correcting for multiple hypothesis testing. Our extensive experimental evaluation on synthetic data shows that our tool is able to identify reliable causal relations in large cancer cohorts. Moreover, the reliable causal rules identified by our tool in cancer data show that our approach identifies several somatic mutations known to be relevant for cancer phenotypes as well as novel biologically meaningful relations.Availability and implementationCode, data, and scripts to reproduce the experiments available at https://github.com/VandinLab/ALLSTAR.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:BACKGROUND:The global burden of disease has shifted from communicable diseases in children to chronic diseases in adults. This epidemiologic shift varies greatly by region, but in Europe, chronic conditions account for 86% of all deaths, 77% of the disease burden, and up to 80% of health care expenditures. A number of risk factors have been implicated in chronic diseases, such as exposure to infectious agents. A number of associations have been well established while others remain uncertain. METHODS AND FINDINGS:We assessed the body of evidence regarding the infectious aetiology of chronic diseases in the peer-reviewed literature over the last decade. Causality was assessed with three different criteria: First, the total number of associations documented in the literature between each infectious agent and chronic condition; second, the epidemiologic study design (quality of the study); third, evidence for the number of Hill's criteria and Koch's postulates that linked the pathogen with the chronic condition. We identified 3136 publications, of which 148 were included in the analysis. There were a total of 75 different infectious agents and 122 chronic conditions. The evidence was strong for five pathogens, based on study type, strength and number of associations; they accounted for 60% of the associations documented in the literature. They were human immunodeficiency virus, hepatitis C virus, Helicobacter pylori, hepatitis B virus, and Chlamydia pneumoniae and were collectively implicated in the aetiology of 37 different chronic conditions. Other pathogens examined were only associated with very few chronic conditions (? 3) and when applying the three different criteria of evidence the strength of the causality was weak. CONCLUSIONS:Prevention and treatment of these five pathogens lend themselves as effective public health intervention entry points. By concentrating research efforts on these promising areas, the human, economic, and societal burden arising from chronic conditions can be reduced.