Electronic Medical Record Context Signatures Improve Diagnostic Classification Using Medical Image Computing.
ABSTRACT: Composite models that combine medical imaging with electronic medical records (EMR) improve predictive power when compared to traditional models that use imaging alone. The digitization of EMR provides potential access to a wealth of medical information, but presents new challenges in algorithm design and inference. Previous studies, such as Phenome Wide Association Study (PheWAS), have shown that EMR data can be used to investigate the relationship between genotypes and clinical conditions. Here, we introduce Phenome-Disease Association Study to extend the statistical capabilities of the PheWAS software through a custom Python package, which creates diagnostic EMR signatures to capture system-wide co-morbidities for a disease population within a given time interval. We investigate the effect of integrating these EMR signatures with radiological data to improve diagnostic classification in disease domains known to have confounding factors because of variable and complex clinical presentation. Specifically, we focus on two studies: First, a study of four major optic nerve related conditions; and second, a study of diabetes. Addition of EMR signature vectors to radiologically derived structural metrics improves the area under the curve (AUC) for diagnostic classification using elastic net regression, for diseases of the optic nerve. For glaucoma, the AUC improves from 0.71 to 0.83, for intrinsic optic nerve disease it increases from 0.72 to 0.91, for optic nerve edema it increases from 0.95 to 0.96, and for thyroid eye disease from 0.79 to 0.89. The EMR signatures recapitulate known comorbidities with diabetes, such as abnormal glucose, but do not significantly modulate image-derived features. In summary, EMR signatures present a scalable and readily applicable.
Project description:Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10?? (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
Project description:MOTIVATION:Emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association scans (PheWAS) for disease-gene associations. We propose a novel method to scan phenomic data for genetic associations using International Classification of Disease (ICD9) billing codes, which are available in most EMR systems. We have developed a code translation table to automatically define 776 different disease populations and their controls using prevalent ICD9 codes derived from EMR data. As a proof of concept of this algorithm, we genotyped the first 6005 European-Americans accrued into BioVU, Vanderbilt's DNA biobank, at five single nucleotide polymorphisms (SNPs) with previously reported disease associations: atrial fibrillation, Crohn's disease, carotid artery stenosis, coronary artery disease, multiple sclerosis, systemic lupus erythematosus and rheumatoid arthritis. The PheWAS software generated cases and control populations across all ICD9 code groups for each of these five SNPs, and disease-SNP associations were analyzed. The primary outcome of this study was replication of seven previously known SNP-disease associations for these SNPs. RESULTS:Four of seven known SNP-disease associations using the PheWAS algorithm were replicated with P-values between 2.8 x 10(-6) and 0.011. The PheWAS algorithm also identified 19 previously unknown statistical associations between these SNPs and diseases at P < 0.01. This study indicates that PheWAS analysis is a feasible method to investigate SNP-disease associations. Further evaluation is needed to determine the validity of these associations and the appropriate statistical thresholds for clinical significance. AVAILABILITY:The PheWAS software and code translation table are freely available at http://knowledgemap.mc.vanderbilt.edu/research.
Project description:We repurposed existing genotypes in DNA biobanks across the Electronic Medical Records and Genomics network to perform a genome-wide association study for primary hypothyroidism, the most common thyroid disease. Electronic selection algorithms incorporating billing codes, laboratory values, text queries, and medication records identified 1317 cases and 5053 controls of European ancestry within five electronic medical records (EMRs); the algorithms' positive predictive values were 92.4% and 98.5% for cases and controls, respectively. Four single-nucleotide polymorphisms (SNPs) in linkage disequilibrium at 9q22 near FOXE1 were associated with hypothyroidism at genome-wide significance, the strongest being rs7850258 (odds ratio [OR] 0.74, p = 3.96 × 10(-9)). This association was replicated in a set of 263 cases and 1616 controls (OR = 0.60, p = 5.7 × 10(-6)). A phenome-wide association study (PheWAS) that was performed on this locus with 13,617 individuals and more than 200,000 patient-years of billing data identified associations with additional phenotypes: thyroiditis (OR = 0.58, p = 1.4 × 10(-5)), nodular (OR = 0.76, p = 3.1 × 10(-5)) and multinodular (OR = 0.69, p = 3.9 × 10(-5)) goiters, and thyrotoxicosis (OR = 0.76, p = 1.5 × 10(-3)), but not Graves disease (OR = 1.03, p = 0.82). Thyroid cancer, previously associated with this locus, was not significantly associated in the PheWAS (OR = 1.29, p = 0.09). The strongest association in the PheWAS was hypothyroidism (OR = 0.76, p = 2.7 × 10(-13)), which had an odds ratio that was nearly identical to that of the curated case-control population in the primary analysis, providing further validation of the PheWAS method. Our findings indicate that EMR-linked genomic data could allow discovery of genes associated with many diseases without additional genotyping cost.
Project description:The significance of non-rheumatoid arthritis (RA) autoantibodies in patients with RA is unclear. The aim of this study was to assess associations of autoantibodies with autoimmune risk alleles and with clinical diagnoses from the electronic medical records (EMRs) among RA cases and non-RA controls.Data on 1,290 RA cases and 1,236 non-RA controls of European genetic ancestry were obtained from the EMRs of 2 large academic centers. The levels of anti-citrullinated protein antibodies (ACPAs), antinuclear antibodies (ANAs), anti-tissue transglutaminase antibodies (AGTAs), and anti-thyroid peroxidase (anti-TPO) antibodies were measured. All subjects were genotyped for autoimmune risk alleles, and the association between number of autoimmune risk alleles present and number of types of autoantibodies present was studied. A phenome-wide association study (PheWAS) was conducted to study potential associations between autoantibodies and clinical diagnoses among RA cases and non-RA controls.The mean ages were 60.7 years in RA cases and 64.6 years in non-RA controls. The proportion of female subjects was 79% in each group. The prevalence of ACPAs and ANAs was higher in RA cases compared to controls (each P < 0.0001); there were no differences in the prevalence of anti-TPO antibodies and AGTAs. Carriage of higher numbers of autoimmune risk alleles was associated with increasing numbers of autoantibody types in RA cases (P = 2.1 × 10(-5)) and non-RA controls (P = 5.0 × 10(-3)). From the PheWAS, the presence of ANAs was significantly associated with a diagnosis of Sjögren's/sicca syndrome in RA cases.The increased frequency of autoantibodies in RA cases and non-RA controls was associated with the number of autoimmune risk alleles carried by an individual. PheWAS of EMR data, with linkage to laboratory data obtained from blood samples, provide a novel method to test for the clinical significance of biomarkers in disease.
Project description:The genome-wide association study (GWAS) is a powerful approach for studying the genetic complexities of human disease. Unfortunately, GWASs often fail to identify clinically significant associations and describing function can be a challenge. GWAS is a phenotype-to-genotype approach. It is now possible to conduct a converse genotype-to-phenotype approach using extensive electronic medical records to define a phenome. This approach associates a single genetic variant with many phenotypes across the phenome and is called a phenome-wide association study (PheWAS). The majority of PheWASs conducted have focused on variants identified previously by GWASs. This approach has been efficient for rediscovering gene-disease associations while also identifying pleiotropic effects for some single-nucleotide polymorphisms (SNPs). However, the use of SNPs identified by GWAS in a PheWAS is limited by the inherent properties of the GWAS SNPs, including weak effect sizes and difficulty when translating discoveries to function. To address these challenges, we conducted a PheWAS on 105 presumed functional stop-gain and stop-loss variants genotyped on 4235 Marshfield Clinic patients. Associations were validated on an additional 10?640 Marshfield Clinic patients. PheWAS results indicate that a nonsense variant in ARMS2 (rs2736911) is associated with age-related macular degeneration (AMD). These results demonstrate that focusing on functional variants may be an effective approach when conducting a PheWAS.
Project description:Platelets are enucleated cell fragments derived from megakaryocytes that play key roles in hemostasis and in the pathogenesis of atherothrombosis and cancer. Platelet traits are highly heritable and identification of genetic variants associated with platelet traits and assessing their pleiotropic effects may help to understand the role of underlying biological pathways. We conducted an electronic medical record (EMR)-based study to identify common variants that influence inter-individual variation in the number of circulating platelets (PLT) and mean platelet volume (MPV), by performing a genome-wide association study (GWAS). We characterized genetic variants associated with MPV and PLT using functional, pathway and disease enrichment analyses; we assessed pleiotropic effects of such variants by performing a phenome-wide association study (PheWAS) with a wide range of EMR-derived phenotypes. A total of 13,582 participants in the electronic MEdical Records and GEnomic network had data for PLT and 6,291 participants had data for MPV. We identified five chromosomal regions associated with PLT and eight associated with MPV at genome-wide significance (P < 5E-8). In addition, we replicated 20 SNPs [out of 56 SNPs (?: 0.05/56 = 9E-4)] influencing PLT and 22 SNPs [out of 29 SNPs (?: 0.05/29 = 2E-3)] influencing MPV in a published meta-analysis of GWAS of PLT and MPV. While our GWAS did not find any new associations, our functional analyses revealed that genes in these regions influence thrombopoiesis and encode kinases, membrane proteins, proteins involved in cellular trafficking, transcription factors, proteasome complex subunits, proteins of signal transduction pathways, proteins involved in megakaryocyte development, and platelet production and hemostasis. PheWAS using a single-SNP Bonferroni correction for 1,368 diagnoses (0.05/1368 = 3.6E-5) revealed that several variants in these genes have pleiotropic associations with myocardial infarction, autoimmune, and hematologic disorders. We conclude that multiple genetic loci influence interindividual variation in platelet traits and also have significant pleiotropic effects; the related genes are in multiple functional pathways including those relevant to thrombopoiesis.
Project description:OBJECTIVE:We report the first pediatric specific Phenome-Wide Association Study (PheWAS) using electronic medical records (EMRs). Given the early success of PheWAS in adult populations, we investigated the feasibility of this approach in pediatric cohorts in which associations between a previously known genetic variant and a wide range of clinical or physiological traits were evaluated. Although computationally intensive, this approach has potential to reveal disease mechanistic relationships between a variant and a network of phenotypes. METHOD:Data on 5049 samples of European ancestry were obtained from the EMRs of two large academic centers in five different genotyped cohorts. Recently, these samples have undergone whole genome imputation. After standard quality controls, removing missing data and outliers based on principal components analyses (PCA), 4268 samples were used for the PheWAS study. We scanned for associations between 2476 single-nucleotide polymorphisms (SNP) with available genotyping data from previously published GWAS studies and 539 EMR-derived phenotypes. The false discovery rate was calculated and, for any new PheWAS findings, a permutation approach (with up to 1,000,000 trials) was implemented. RESULTS:This PheWAS found a variety of common variants (MAF > 10%) with prior GWAS associations in our pediatric cohorts including Juvenile Rheumatoid Arthritis (JRA), Asthma, Autism and Pervasive Developmental Disorder (PDD) and Type 1 Diabetes with a false discovery rate < 0.05 and power of study above 80%. In addition, several new PheWAS findings were identified including a cluster of association near the NDFIP1 gene for mental retardation (best SNP rs10057309, p = 4.33 × 10(-7), OR = 1.70, 95%CI = 1.38 - 2.09); association near PLCL1 gene for developmental delays and speech disorder [best SNP rs1595825, p = 1.13 × 10(-8), OR = 0.65(0.57 - 0.76)]; a cluster of associations in the IL5-IL13 region with Eosinophilic Esophagitis (EoE) [best at rs12653750, p = 3.03 × 10(-9), OR = 1.73 95%CI = (1.44 - 2.07)], previously implicated in asthma, allergy, and eosinophilia; and association of variants in GCKR and JAZF1 with allergic rhinitis in our pediatric cohorts [best SNP rs780093, p = 2.18 × 10(-5), OR = 1.39, 95%CI = (1.19 - 1.61)], previously demonstrated in metabolic disease and diabetes in adults. CONCLUSION:The PheWAS approach with re-mapping ICD-9 structured codes for our European-origin pediatric cohorts, as with the previous adult studies, finds many previously reported associations as well as presents the discovery of associations with potentially important clinical implications.
Project description:MOTIVATION:Genome-wide association studies (GWASs) are effective for describing genetic complexities of common diseases. Phenome-wide association studies (PheWASs) offer an alternative and complementary approach to GWAS using data embedded in the electronic health record (EHR) to define the phenome. International Classification of Disease version 9 (ICD9) codes are used frequently to define the phenome, but using ICD9 codes alone misses other clinically relevant information from the EHR that can be used for PheWAS analyses and discovery. RESULTS:As an alternative to ICD9 coding, a text-based phenome was defined by 23?384 clinically relevant terms extracted from Marshfield Clinic's EHR. Five single nucleotide polymorphisms (SNPs) with known phenotypic associations were genotyped in 4235 individuals and associated across the text-based phenome. All five SNPs genotyped were associated with expected terms (P<0.02), most at or near the top of their respective PheWAS ranking. Raw association results indicate that text data performed equivalently to ICD9 coding and demonstrate the utility of information beyond ICD9 coding for application in PheWAS.