PheGWAS: a new dimension to visualize GWAS across multiple phenotypes.
ABSTRACT: MOTIVATION:PheGWAS was developed to enhance exploration of phenome-wide pleiotropy at the genome-wide level through the efficient generation of a dynamic visualization combining Manhattan plots from GWAS with PheWAS to create a 3D 'landscape'. Pleiotropy in sub-surface GWAS significance strata can be explored in a sectional view plotted within user defined levels. Further complexity reduction is achieved by confining to a single chromosomal section. Comprehensive genomic and phenomic coordinates can be displayed. RESULTS:PheGWAS is demonstrated using summary data from Global Lipids Genetics Consortium GWAS across multiple lipid traits. For single and multiple traits PheGWAS highlighted all 88 and 69 loci, respectively. Further, the genes and SNPs reported in Global Lipids Genetics Consortium were identified using additional functions implemented within PheGWAS. Not only is PheGWAS capable of identifying independent signals but also provides insights to local genetic correlation (verified using HESS) and in identifying the potential regions that share causal variants across phenotypes (verified using colocalization tests). AVAILABILITY AND IMPLEMENTATION:The PheGWAS software and code are freely available at (https://github.com/georgeg0/PheGWAS). SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:BACKGROUND:Genetics is best dedicated to interpreting pathogenesis and revealing gene functions. The past decade has witnessed unprecedented progress in genetics, particularly in genome-wide identification of disorder variants through Genome-Wide Association Studies (GWAS) and Phenome-Wide Association Studies (PheWAS). However, it is still a great challenge to use GWAS/PheWAS-derived data to elucidate pathogenesis. METHODS:In this study, we used HotNet2, a heat diffusion-based systems genetics algorithm, to calculate the networks for disease genes obtained from GWAS and PheWAS, with an attempt to get deeper insights into disease pathogenesis at a molecular level. RESULTS:Through HotNet2 calculation, significant networks for 202 (for GWAS) and 167 (for PheWAS) types of diseases were identified and evaluated, respectively. The GWAS-derived disease networks exhibit a stronger biomedical relevance than PheWAS counterparts. Therefore, the GWAS-derived networks were used for pathogenesis interpretation by integrating the accumulated biomedical information. As a result, the pathogenesis for 64 diseases was elucidated in terms of mutation-caused abnormal transcriptional regulation, and 47 diseases were preliminarily interpreted in terms of mutation-caused varied protein-protein interactions. In addition, 3,802 genes (including 46 function-unknown genes) were assigned with new functions by disease network information, some of which were validated through mice gene knockout experiments. CONCLUSIONS:Systems genetics algorithm HotNet2 can efficiently establish genotype-phenotype links at the level of biological networks. Compared with original GWAS/PheWAS results, HotNet2-calculated disease-gene associations have stronger biomedical significance, hence provide better interpretations for the pathogenesis of genome-wide variants, and offer new insights into gene functions as well. These results are also helpful in drug development.
Project description:Genome-wide association studies (GWAS) have identified multiple genetic loci for C-reactive protein (CRP) and lipids, of which some overlap. We aimed to identify genetic pleiotropy among CRP and lipids in order to better understand the shared biology of chronic inflammation and lipid metabolism.In a bivariate GWAS, we combined summary statistics of published GWAS on CRP (n = 66,185) and lipids, including LDL-cholesterol, HDL-cholesterol, triglycerides, and total cholesterol (n = 100,184), using an empirical weighted linear-combined test statistic. We sought replication for novel CRP associations in an independent sample of 17,743 genotyped individuals, and performed in silico replication of novel lipid variants in 93,982 individuals. Fifty potentially pleiotropic SNPs were identified among CRP and lipids: 21 for LDL-cholesterol and CRP, 20 for HDL-cholesterol and CRP, 21 for triglycerides, and CRP and 20 for total cholesterol and CRP. We identified and significantly replicated three novel SNPs for CRP in or near CTSB/FDFT1 (rs10435719, Preplication: 2.6 × 10(-5)), STAG1/PCCB (rs7621025, Preplication: 1.4 × 10(-3)) and FTO (rs1558902, Preplication: 2.7 × 10(-5)). Seven pleiotropic lipid loci were replicated in the independent set of MetaboChip samples of the Global Lipids Genetics Consortium. Annotating the effect of replicated CRP SNPs to the expression of nearby genes, we observed an effect of rs10435719 on gene expression of FDFT1, and an effect of rs7621025 on PCCB.Our large scale combined GWAS analysis identified numerous pleiotropic loci for CRP and lipids providing further insight in the genetic interrelation between lipids and inflammation. In addition, we provide evidence for FDFT1, PCCB and FTO to be associated with CRP levels.
Project description:BACKGROUND:Phenome-Wide Association Studies (PheWAS) can be used to investigate the association between single nucleotide polymorphisms (SNPs) and a wide spectrum of phenotypes. This is a complementary approach to Genome Wide Association studies (GWAS) that calculate the association between hundreds of thousands of SNPs and one or a limited range of phenotypes. The extensive exploration of the association between phenotypic structure and genotypic variation through PheWAS produces a set of complex and comprehensive results. Integral to fully inspecting, analysing, and interpreting PheWAS results is visualization of the data. RESULTS:We have developed the software PheWAS-View for visually integrating PheWAS results, including information about the SNPs, relevant genes, phenotypes, and the interrelationships between phenotypes, that exist in PheWAS. As a result both the fine grain detail as well as the larger trends that exist within PheWAS results can be elucidated. CONCLUSIONS:PheWAS can be used to discover novel relationships between SNPs, phenotypes, and networks of interrelated phenotypes; identify pleiotropy; provide novel mechanistic insights; and foster hypothesis generation - and these results can be both explored and presented with PheWAS-View. PheWAS-View is freely available for non-commercial research institutions, for full details see http://ritchielab.psu.edu/ritchielab/software.
Project description:Integrating association evidence across multiple traits can improve the power of gene discovery and reveal pleiotropy. Most multi-trait analysis methods focus on individual common variants in genome-wide association studies. Here, we introduce multi-trait analysis of rare-variant associations (MTAR), a framework for joint analysis of association summary statistics between multiple rare variants and different traits. MTAR achieves substantial power gain by leveraging the genome-wide genetic correlation measure to inform the degree of gene-level effect heterogeneity across traits. We apply MTAR to rare-variant summary statistics for three lipid traits in the Global Lipids Genetics Consortium. 99 genome-wide significant genes were identified in the single-trait-based tests, and MTAR increases this to 139. Among the 11 novel lipid-associated genes discovered by MTAR, 7 are replicated in an independent UK Biobank GWAS analysis. Our study demonstrates that MTAR is substantially more powerful than single-trait-based tests and highlights the value of MTAR for novel gene discovery.
Project description:Electronic health records (EHR) provide a comprehensive resource for discovery, allowing unprecedented exploration of the impact of genetic architecture on health and disease. The data of EHRs also allow for exploration of the complex interactions between health measures across health and disease. The discoveries arising from EHR based research provide important information for the identification of genetic variation for clinical decision-making. Due to the breadth of information collected within the EHR, a challenge for discovery using EHR based data is the development of high-throughput tools that expose important areas of further research, from genetic variants to phenotypes. Phenome-Wide Association studies (PheWAS) provide a way to explore the association between genetic variants and comprehensive phenotypic measurements, generating new hypotheses and also exposing the complex relationships between genetic architecture and outcomes, including pleiotropy. EHR based PheWAS have mainly evaluated associations with case/control status from International Classification of Disease, Ninth Edition (ICD-9) codes. While these studies have highlighted discovery through PheWAS, the rich resource of clinical lab measures collected within the EHR can be better utilized for high-throughput PheWAS analyses and discovery. To better use these resources and enrich PheWAS association results we have developed a sound methodology for extracting a wide range of clinical lab measures from EHR data. We have extracted a first set of 21 clinical lab measures from the de-identified EHR of participants of the Geisinger MyCodeTM biorepository, and calculated the median of these lab measures for 12,039 subjects. Next we evaluated the association between these 21 clinical lab median values and 635,525 genetic variants, performing a genome-wide association study (GWAS) for each of 21 clinical lab measures. We then calculated the association between SNPs from these GWAS passing our Bonferroni defined p-value cutoff and 165 ICD-9 codes. Through the GWAS we found a series of results replicating known associations, and also some potentially novel associations with less studied clinical lab measures. We found the majority of the PheWAS ICD-9 diagnoses highly related to the clinical lab measures associated with same SNPs. Moving forward, we will be evaluating further phenotypes and expanding the methodology for successful extraction of clinical lab measurements for research and PheWAS use. These developments are important for expanding the PheWAS approach for improved EHR based discovery.
Project description:The concept of pleiotropy was proposed a century ago, though up to now there have been insufficient efforts to design robust statistics and software aimed at visualizing and evaluating pleiotropy at a regional level. The Pleiotropic Region Identification Method (PRIMe) was developed to evaluate potentially pleiotropic loci based upon data from multiple genome-wide association studies (GWAS).We first provide a software tool to systematically identify and characterize genomic regions where low association P-values are observed with multiple traits. We use the term Pleiotropy Index to denote the number of traits with low association P-values at a particular genomic region. For GWAS assumed to be uncorrelated, we adopted the binomial distribution to approximate the statistical significance of the Pleiotropy Index. For GWAS conducted on traits with known correlation coefficients, simulations are performed to derive the statistical distribution of the Pleiotropy Index under the null hypothesis of no genotype-phenotype association. For six hematologic and three blood pressure traits where full GWAS results were available from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, we estimated the trait correlations and applied the simulation approach to examine genomic regions with statistical evidence of pleiotropy. We then applied the approximation approach to explore GWAS summarized in the National Human Genome Research Institute (NHGRI) GWAS Catalog.By simulation, we identified pleiotropic regions including SH2B3 and BRAP (12q24.12) for hematologic and blood pressure traits. By approximation, we confirmed the genome-wide significant pleiotropy of these two regions based on the GWAS Catalog data, together with an exploration on other regions which highlights the FTO, GCKR and ABO regions.The Perl and R scripts are available at http://www.framinghamheartstudy.org/research/gwas_pleiotropictool.html.
Project description:Beginning in the early 2000s, the accumulation of biospecimens linked to electronic health records (EHRs) made possible genome-phenome studies (i.e., comparative analyses of genetic variants and phenotypes) using only data collected as a by-product of typical health care. In addition to disease and trait genetics, EHRs proved a valuable resource for analyzing pharmacogenetic traits and developing reverse genetics approaches such as phenome-wide association studies (PheWASs). PheWASs are designed to survey which of many phenotypes may be associated with a given genetic variant. PheWAS methods have been validated through replication of hundreds of known genotype-phenotype associations, and their use has differentiated between true pleiotropy and clinical comorbidity, added context to genetic discoveries, and helped define disease subtypes, and may also help repurpose medications. PheWAS methods have also proven to be useful with research-collected data. Future efforts that integrate broad, robust collection of phenotype data (e.g., EHR data) with purpose-collected research data in combination with a greater understanding of EHR data will create a rich resource for increasingly more efficient and detailed genome-phenome analysis to usher in new discoveries in precision medicine.
Project description:Mendelian randomization (MR) is an established approach to evaluate the effect of an exposure on an outcome. The gene-by-environment (GxE) study design can be used to determine whether the genetic instrument affects the outcome through pathways other than via the exposure of interest (horizontal pleiotropy). MR phenome-wide association studies (MR-pheWAS) search for the effects of an exposure, and can be conducted in UK Biobank using the PHESANT package. In this proof-of-principle study, we introduce the novel GxE MR-pheWAS approach, that combines MR-pheWAS with the use of GxE interactions. This method aims to identify the presence of effects of an exposure while simultaneously investigating horizontal pleiotropy. We systematically test for the presence of causal effects of smoking heaviness-stratifying on smoking status (ever versus never)-as an exemplar. If a genetic variant is associated with smoking heaviness (but not smoking initiation), and this variant affects an outcome (at least partially) via tobacco intake, we would expect the effect of the variant on the outcome to differ in ever versus never smokers. We used PHESANT to test for the presence of effects of smoking heaviness, instrumented by genetic variant rs16969968, among never and ever smokers respectively, in UK Biobank. We ranked results by the strength of interaction between ever and never smokers. We replicated previously established effects of smoking heaviness, including detrimental effects on lung function. Novel results included a detrimental effect of heavier smoking on facial aging. We have demonstrated how GxE MR-pheWAS can be used to identify potential effects of an exposure, while simultaneously assessing whether results may be biased by horizontal pleiotropy.
Project description:Over the last decade, significant technological breakthroughs have revolutionized human genomic research in the form of genome-wide association studies (GWASs). GWASs have identified thousands of statistically significant genetic variants associated with hundreds of human conditions including many with immunological aetiologies (e.g. multiple sclerosis, ankylosing spondylitis and rheumatoid arthritis). Unfortunately, most GWASs fail to identify clinically significant associations. Identifying biologically significant variants by GWAS also presents a challenge. The GWAS is a phenotype-to-genotype approach. As a complementary/alternative approach to the GWAS, investigators have begun to exploit extensive electronic medical record systems to conduct a genotype-to-phenotype approach when studying human disease - specifically, the phenome-wide association study (PheWAS). Although the PheWAS approach is in its infancy, this method has already demonstrated its capacity to rediscover important genetic associations related to immunological diseases/conditions. Furthermore, PheWAS has the advantage of identifying genetic variants with pleiotropic properties. This is particularly relevant for HLA variants. For example, PheWAS results have demonstrated that the HLA-DRB1 variant associated with multiple sclerosis may also be associated with erythematous conditions including rosacea. Likewise, PheWAS has demonstrated that the HLA-B genotype is not only associated with spondylopathies, uveitis, and variability in platelet count, but may also play an important role in other conditions, such as mastoiditis. This review will discuss and compare general PheWAS methodologies, describe both the challenges and advantages of the PheWAS, and provide insight into the potential directions in which PheWAS may lead.
Project description:MOTIVATION:Emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association scans (PheWAS) for disease-gene associations. We propose a novel method to scan phenomic data for genetic associations using International Classification of Disease (ICD9) billing codes, which are available in most EMR systems. We have developed a code translation table to automatically define 776 different disease populations and their controls using prevalent ICD9 codes derived from EMR data. As a proof of concept of this algorithm, we genotyped the first 6005 European-Americans accrued into BioVU, Vanderbilt's DNA biobank, at five single nucleotide polymorphisms (SNPs) with previously reported disease associations: atrial fibrillation, Crohn's disease, carotid artery stenosis, coronary artery disease, multiple sclerosis, systemic lupus erythematosus and rheumatoid arthritis. The PheWAS software generated cases and control populations across all ICD9 code groups for each of these five SNPs, and disease-SNP associations were analyzed. The primary outcome of this study was replication of seven previously known SNP-disease associations for these SNPs. RESULTS:Four of seven known SNP-disease associations using the PheWAS algorithm were replicated with P-values between 2.8 x 10(-6) and 0.011. The PheWAS algorithm also identified 19 previously unknown statistical associations between these SNPs and diseases at P < 0.01. This study indicates that PheWAS analysis is a feasible method to investigate SNP-disease associations. Further evaluation is needed to determine the validity of these associations and the appropriate statistical thresholds for clinical significance. AVAILABILITY:The PheWAS software and code translation table are freely available at http://knowledgemap.mc.vanderbilt.edu/research.