The challenges, advantages and future of phenome-wide association studies.
ABSTRACT: Over the last decade, significant technological breakthroughs have revolutionized human genomic research in the form of genome-wide association studies (GWASs). GWASs have identified thousands of statistically significant genetic variants associated with hundreds of human conditions including many with immunological aetiologies (e.g. multiple sclerosis, ankylosing spondylitis and rheumatoid arthritis). Unfortunately, most GWASs fail to identify clinically significant associations. Identifying biologically significant variants by GWAS also presents a challenge. The GWAS is a phenotype-to-genotype approach. As a complementary/alternative approach to the GWAS, investigators have begun to exploit extensive electronic medical record systems to conduct a genotype-to-phenotype approach when studying human disease - specifically, the phenome-wide association study (PheWAS). Although the PheWAS approach is in its infancy, this method has already demonstrated its capacity to rediscover important genetic associations related to immunological diseases/conditions. Furthermore, PheWAS has the advantage of identifying genetic variants with pleiotropic properties. This is particularly relevant for HLA variants. For example, PheWAS results have demonstrated that the HLA-DRB1 variant associated with multiple sclerosis may also be associated with erythematous conditions including rosacea. Likewise, PheWAS has demonstrated that the HLA-B genotype is not only associated with spondylopathies, uveitis, and variability in platelet count, but may also play an important role in other conditions, such as mastoiditis. This review will discuss and compare general PheWAS methodologies, describe both the challenges and advantages of the PheWAS, and provide insight into the potential directions in which PheWAS may lead.
Project description:The genome-wide association study (GWAS) is a powerful approach for studying the genetic complexities of human disease. Unfortunately, GWASs often fail to identify clinically significant associations and describing function can be a challenge. GWAS is a phenotype-to-genotype approach. It is now possible to conduct a converse genotype-to-phenotype approach using extensive electronic medical records to define a phenome. This approach associates a single genetic variant with many phenotypes across the phenome and is called a phenome-wide association study (PheWAS). The majority of PheWASs conducted have focused on variants identified previously by GWASs. This approach has been efficient for rediscovering gene-disease associations while also identifying pleiotropic effects for some single-nucleotide polymorphisms (SNPs). However, the use of SNPs identified by GWAS in a PheWAS is limited by the inherent properties of the GWAS SNPs, including weak effect sizes and difficulty when translating discoveries to function. To address these challenges, we conducted a PheWAS on 105 presumed functional stop-gain and stop-loss variants genotyped on 4235 Marshfield Clinic patients. Associations were validated on an additional 10?640 Marshfield Clinic patients. PheWAS results indicate that a nonsense variant in ARMS2 (rs2736911) is associated with age-related macular degeneration (AMD). These results demonstrate that focusing on functional variants may be an effective approach when conducting a PheWAS.
Project description:HLA-DRB1 codes for a major histocompatibility complex class II cell surface receptor. Genetic variants in and around this gene have been linked to numerous autoimmune diseases. Most notably, an association between HLA-DRB1*1501 haplotype and multiple sclerosis (MS) has been defined. Utilizing electronic health records and 4235 individuals within Marshfield Clinic's Personalized Medicine Research Project, a reverse genetic screen coined phenome-wide association study (PheWAS) tested association of rs3135388 genotype (tagging HLA-DRB1*1501) with 4841 phenotypes. As expected, HLA-DRB1*1501 was associated with MS (International Classification of Disease version 9-CM (ICD9) 340, P=0.023), whereas the strongest association was with alcohol-induced cirrhosis of the liver (ICD9 571.2, P=0.00011). HLA-DRB1*1501 also demonstrated association with erythematous conditions (ICD9 695, P=0.0054) and benign neoplasms of the respiratory and intrathoracic organs (ICD9 212, P=0.042), replicating previous findings. This study not only builds on the feasibility/utility of the PheWAS approach, represents the first external validation of a PheWAS, but may also demonstrate the complex etiologies associated with the HLA-DRB1*1501 loci.
Project description:The aggregation of Electronic Health Records (EHR) and personalized genetics leads to powerful discoveries relevant to population health. Here we perform genome-wide association studies (GWAS) and accompanying phenome-wide association studies (PheWAS) to validate phenotype-genotype associations of BMI, and to a greater extent, severe Class 2 obesity, using comprehensive diagnostic and clinical data from the EHR database of our cohort. Three GWASs of 500,000 variants on the Illumina platform of 6,645 Healthy Nevada participants identified several published and novel variants that affect BMI and obesity. Each GWAS was followed with two independent PheWASs to examine associations between extensive phenotypes (incidence of diagnoses, condition, or disease), significant SNPs, BMI, and incidence of extreme obesity. The first GWAS examines associations with BMI in a cohort with no type 2 diabetics, focusing exclusively on BMI. The second GWAS examines associations with BMI in a cohort that includes type 2 diabetics. In the second GWAS, type 2 diabetes is a comorbidity, and thus becomes a covariate in the statistical model. The intersection of significant variants of these two studies is surprising. The third GWAS is a case vs. control study, with cases defined as extremely obese (Class 2 or 3 obesity), and controls defined as participants with BMI between 18.5 and 25. This last GWAS identifies strong associations with extreme obesity, including established variants in the FTO and NEGR1 genes, as well as loci not yet linked to obesity. The PheWASs validate published associations between BMI and extreme obesity and incidence of specific diagnoses and conditions, yet also highlight novel links. This study emphasizes the importance of our extensive longitudinal EHR database to validate known associations and identify putative novel links with BMI and obesity.
Project description:Genome-wide association studies (GWASs) have been widely applied to identify genetic factors that affect complex diseases or traits. Presently, the GWAS Catalog includes >2800 human studies. Of these, only a minority have investigated the susceptibility to infectious diseases or the response to therapies for the treatment or prevention of infections. Despite their limited application in the field, GWASs have provided valuable insights by pinpointing associations to both innate and adaptive immune response loci, as well as novel unexpected risk factors for infection susceptibility. Herein, we discuss some issues and caveats of GWASs for infectious diseases, we review the most recent findings ensuing from these studies, and we provide a brief summary of selected GWASs for infections in non-human mammals. We conclude that, although the general trend in the field of complex traits is to shift from GWAS to next-generation sequencing, important knowledge on infectious disease-related traits can be still gained by GWASs, especially for those conditions that have never been investigated using this approach. We suggest that future studies will benefit from the leveraging of information from the host's and pathogen's genomes, as well as from the exploration of models that incorporate heterogeneity across populations and phenotypes. Interactions within HLA genes or among HLA variants and polymorphisms located outside the major histocompatibility complex may also play an important role in shaping the susceptibility and response to invading pathogens.
Project description:BACKGROUND:Genetics is best dedicated to interpreting pathogenesis and revealing gene functions. The past decade has witnessed unprecedented progress in genetics, particularly in genome-wide identification of disorder variants through Genome-Wide Association Studies (GWAS) and Phenome-Wide Association Studies (PheWAS). However, it is still a great challenge to use GWAS/PheWAS-derived data to elucidate pathogenesis. METHODS:In this study, we used HotNet2, a heat diffusion-based systems genetics algorithm, to calculate the networks for disease genes obtained from GWAS and PheWAS, with an attempt to get deeper insights into disease pathogenesis at a molecular level. RESULTS:Through HotNet2 calculation, significant networks for 202 (for GWAS) and 167 (for PheWAS) types of diseases were identified and evaluated, respectively. The GWAS-derived disease networks exhibit a stronger biomedical relevance than PheWAS counterparts. Therefore, the GWAS-derived networks were used for pathogenesis interpretation by integrating the accumulated biomedical information. As a result, the pathogenesis for 64 diseases was elucidated in terms of mutation-caused abnormal transcriptional regulation, and 47 diseases were preliminarily interpreted in terms of mutation-caused varied protein-protein interactions. In addition, 3,802 genes (including 46 function-unknown genes) were assigned with new functions by disease network information, some of which were validated through mice gene knockout experiments. CONCLUSIONS:Systems genetics algorithm HotNet2 can efficiently establish genotype-phenotype links at the level of biological networks. Compared with original GWAS/PheWAS results, HotNet2-calculated disease-gene associations have stronger biomedical significance, hence provide better interpretations for the pathogenesis of genome-wide variants, and offer new insights into gene functions as well. These results are also helpful in drug development.
Project description:BACKGROUND:Genome-phenome studies have identified thousands of variants that are statistically associated with disease or traits; however, their functional roles are largely unclear. A comprehensive investigation of regulatory mechanisms and the gene regulatory networks between phenome-wide association study (PheWAS) and genome-wide association study (GWAS) is needed to identify novel regulatory variants contributing to risk for human diseases. METHODS:In this study, we developed an integrative functional genomics framework that maps 215,107 significant single nucleotide polymorphism (SNP) traits generated from the PheWAS Catalog and 28,870 genome-wide significant SNP traits collected from the GWAS Catalog into a global human genome regulatory map via incorporating various functional annotation data, including transcription factor (TF)-based motifs, promoters, enhancers, and expression quantitative trait loci (eQTLs) generated from four major functional genomics databases: FANTOM5, ENCODE, NIH Roadmap, and Genotype-Tissue Expression (GTEx). In addition, we performed a tissue-specific regulatory circuit analysis through the integration of the identified regulatory variants and tissue-specific gene expression profiles in 7051 samples across 32 tissues from GTEx. RESULTS:We found that the disease-associated loci in both the PheWAS and GWAS Catalogs were significantly enriched with functional SNPs. The integration of functional annotations significantly improved the power of detecting novel associations in PheWAS, through which we found a number of functional associations with strong regulatory evidence in the PheWAS Catalog. Finally, we constructed tissue-specific regulatory circuits for several complex traits: mental diseases, autoimmune diseases, and cancer, via exploring tissue-specific TF-promoter/enhancer-target gene interaction networks. We uncovered several promising tissue-specific regulatory TFs or genes for Alzheimer's disease (e.g. ZIC1 and STX1B) and asthma (e.g. CSF3 and IL1RL1). CONCLUSIONS:This study offers powerful tools for exploring the functional consequences of variants generated from genome-phenome association studies in terms of their mechanisms on affecting multiple complex diseases and traits.
Project description:Membranous nephropathy (MN) is a common cause of nephrotic syndrome in adults. Previous genome-wide association studies (GWAS) of 300 000 genotyped variants identified MN-associated loci at HLA-DQA1 and PLA2R1.We used a combined approach of genotype imputation, GWAS, human leucocyte antigen (HLA) imputation and extension to other aetiologies of chronic kidney disease (CKD) to investigate genetic MN risk variants more comprehensively. GWAS using 9 million high-quality imputed genotypes and classical HLA alleles were conducted for 323 MN European-ancestry cases and 345 controls. Additionally, 4960 patients with different CKD aetiologies in the German Chronic Kidney Disease (GCKD) study were genotyped for risk variants at HLA-DQA1 and PLA2R1.In GWAS, lead variants in known loci [rs9272729, HLA-DQA1, odds ratio (OR) = 7.3 per risk allele, P = 5.9 × 10-27 and rs17830558, PLA2R1, OR = 2.2, P = 1.9 × 10-8] were significantly associated with MN. No novel signals emerged in GWAS of X-chromosomal variants or in sex-specific analyses. Classical HLA alleles (DRB1*0301-DQA1*0501-DQB1*0201 haplotype) were associated with MN but provided little additional information beyond rs9272729. Associations were replicated in 137 GCKD patients with MN (HLA-DQA1: P = 6.4 × 10-24; PLA2R1: P = 5.0 × 10-4). MN risk increased steeply for patients with high-risk genotype combinations (OR > 79). While genetic variation in PLA2R1 exclusively associated with MN across 19 CKD aetiologies, the HLA-DQA1 risk allele was also associated with lupus nephritis (P = 2.8 × 10-6), type 1 diabetic nephropathy (P = 6.9 × 10-5) and focal segmental glomerulosclerosis (P = 5.1 × 10-5), but not with immunoglobulin A nephropathy.PLA2R1 and HLA-DQA1 are the predominant risk loci for MN detected by GWAS. While HLA-DQA1 risk variants show an association with other CKD aetiologies, PLA2R1 variants are specific to MN.
Project description:MOTIVATION:Genome-wide association studies (GWASs) are effective for describing genetic complexities of common diseases. Phenome-wide association studies (PheWASs) offer an alternative and complementary approach to GWAS using data embedded in the electronic health record (EHR) to define the phenome. International Classification of Disease version 9 (ICD9) codes are used frequently to define the phenome, but using ICD9 codes alone misses other clinically relevant information from the EHR that can be used for PheWAS analyses and discovery. RESULTS:As an alternative to ICD9 coding, a text-based phenome was defined by 23?384 clinically relevant terms extracted from Marshfield Clinic's EHR. Five single nucleotide polymorphisms (SNPs) with known phenotypic associations were genotyped in 4235 individuals and associated across the text-based phenome. All five SNPs genotyped were associated with expected terms (P<0.02), most at or near the top of their respective PheWAS ranking. Raw association results indicate that text data performed equivalently to ICD9 coding and demonstrate the utility of information beyond ICD9 coding for application in PheWAS.
Project description:Using a phenome-wide association study (PheWAS) approach, we comprehensively tested genetic variants for association with phenotypes available for 70,061 study participants in the Population Architecture using Genomics and Epidemiology (PAGE) network. Our aim was to better characterize the genetic architecture of complex traits and identify novel pleiotropic relationships. This PheWAS drew on five population-based studies representing four major racial/ethnic groups (European Americans (EA), African Americans (AA), Hispanics/Mexican-Americans, and Asian/Pacific Islanders) in PAGE, each site with measurements for multiple traits, associated laboratory measures, and intermediate biomarkers. A total of 83 single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) were genotyped across two or more PAGE study sites. Comprehensive tests of association, stratified by race/ethnicity, were performed, encompassing 4,706 phenotypes mapped to 105 phenotype-classes, and association results were compared across study sites. A total of 111 PheWAS results had significant associations for two or more PAGE study sites with consistent direction of effect with a significance threshold of p<0.01 for the same racial/ethnic group, SNP, and phenotype-class. Among results identified for SNPs previously associated with phenotypes such as lipid traits, type 2 diabetes, and body mass index, 52 replicated previously published genotype-phenotype associations, 26 represented phenotypes closely related to previously known genotype-phenotype associations, and 33 represented potentially novel genotype-phenotype associations with pleiotropic effects. The majority of the potentially novel results were for single PheWAS phenotype-classes, for example, for CDKN2A/B rs1333049 (previously associated with type 2 diabetes in EA) a PheWAS association was identified for hemoglobin levels in AA. Of note, however, GALNT2 rs2144300 (previously associated with high-density lipoprotein cholesterol levels in EA) had multiple potentially novel PheWAS associations, with hypertension related phenotypes in AA and with serum calcium levels and coronary artery disease phenotypes in EA. PheWAS identifies associations for hypothesis generation and exploration of the genetic architecture of complex traits.
Project description:In the post-GWAS (Genome-Wide Association Scan) era, the interpretation of GWAS results is crucial to screen for highly relevant phenotype-genotype association pairs. Based on the single genotype-phenotype association test and a pathway enrichment analysis, we propose a Metabolite-pathway-based Phenome-Wide Association Scan (M-PheWAS) to analyze the key metabolite-SNP pairs in rice and determine the regulatory relationship by assessing similarities in the changes of enzymes and downstream products in a pathway. Two SNPs, sf0315305925 and sf0315308337, were selected using this approach, and their molecular function and regulatory relationship with Enzyme EC:18.104.22.168 and with flavonoids, a significant downstream regulatory metabolite product, were demonstrated. Moreover, a total of 105 crucial SNPs were screened using M-PheWAS, which may be important for metabolite associations.