3397 Genetic variants in gestational diabetes mellitus
ABSTRACT: OBJECTIVES/SPECIFIC AIMS: This study aims to identify genetic biomarkers of GDM and facilitate the understanding of its molecular underpinnings. METHODS/STUDY POPULATION: We identified a cohort of mothers diagnosed with GDM in our longitudinal birth study by mining Electronic Health Records of participants utilizing PheCode map with ICD-9 and ICD-10 codes. We verified each case using ACOG’s GDM diagnosis criteria. RESULTS/ANTICIPATED RESULTS: Whole genome sequencing (WGS) data were available for 111 confirmed cases (out of 205) and 706 controls (out of 1,429) from different ancestries (412 EUR, 256 AMR, 56 EAS, 26 SAS and 18 AFR; 49 OTHER). SAS had the highest incidence of GDM at 38.46% and EUR had the lowest at 6.55%. We performed logistic regression using computed ancestry, age and BMI as covariates to determine if any variants are associated with GDM. The top variant (rs139014401) was found in an intron of DFFB gene, which is p53-bound and regulates DNA fragmentation during apoptosis. We will investigate the robustness of 49 identified variants and will separate the cohort by ancestry to detect population-specific differences in the top loci. DISCUSSION/SIGNIFICANCE OF IMPACT: Identification of molecular biomarkers in GDM across different ancestral backgrounds will address a gap in current GDM research. Findings may enhance screening and enable clinicians to identify those at risk for developing GDM earlier in the pregnancy. Early management of mothers at risk may lead to better health outcomes for mother and baby.
Project description:BACKGROUND:The phecode system was built upon the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) for phenome-wide association studies (PheWAS) using the electronic health record (EHR). OBJECTIVE:The goal of this paper was to develop and perform an initial evaluation of maps from the International Classification of Diseases, 10th Revision (ICD-10) and the International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes to phecodes. METHODS:We mapped ICD-10 and ICD-10-CM codes to phecodes using a number of methods and resources, such as concept relationships and explicit mappings from the Centers for Medicare & Medicaid Services, the Unified Medical Language System, Observational Health Data Sciences and Informatics, Systematized Nomenclature of Medicine-Clinical Terms, and the National Library of Medicine. We assessed the coverage of the maps in two databases: Vanderbilt University Medical Center (VUMC) using ICD-10-CM and the UK Biobank (UKBB) using ICD-10. We assessed the fidelity of the ICD-10-CM map in comparison to the gold-standard ICD-9-CM phecode map by investigating phenotype reproducibility and conducting a PheWAS. RESULTS:We mapped >75% of ICD-10 and ICD-10-CM codes to phecodes. Of the unique codes observed in the UKBB (ICD-10) and VUMC (ICD-10-CM) cohorts, >90% were mapped to phecodes. We observed 70-75% reproducibility for chronic diseases and <10% for an acute disease for phenotypes sourced from the ICD-10-CM phecode map. Using the ICD-9-CM and ICD-10-CM maps, we conducted a PheWAS with a Lipoprotein(a) genetic variant, rs10455872, which replicated two known genotype-phenotype associations with similar effect sizes: coronary atherosclerosis (ICD-9-CM: P<.001; odds ratio (OR) 1.60 [95% CI 1.43-1.80] vs ICD-10-CM: P<.001; OR 1.60 [95% CI 1.43-1.80]) and chronic ischemic heart disease (ICD-9-CM: P<.001; OR 1.56 [95% CI 1.35-1.79] vs ICD-10-CM: P<.001; OR 1.47 [95% CI 1.22-1.77]). CONCLUSIONS:This study introduces the beta versions of ICD-10 and ICD-10-CM to phecode maps that enable researchers to leverage accumulated ICD-10 and ICD-10-CM data for PheWAS in the EHR.
Project description:To compare three groupings of Electronic Health Record (EHR) billing codes for their ability to represent clinically meaningful phenotypes and to replicate known genetic associations. The three tested coding systems were the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes, the Agency for Healthcare Research and Quality Clinical Classification Software for ICD-9-CM (CCS), and manually curated "phecodes" designed to facilitate phenome-wide association studies (PheWAS) in EHRs.We selected 100 disease phenotypes and compared the ability of each coding system to accurately represent them without performing additional groupings. The 100 phenotypes included 25 randomly-chosen clinical phenotypes pursued in prior genome-wide association studies (GWAS) and another 75 common disease phenotypes mentioned across free-text problem lists from 189,289 individuals. We then evaluated the performance of each coding system to replicate known associations for 440 SNP-phenotype pairs.Out of the 100 tested clinical phenotypes, phecodes exactly matched 83, compared to 53 for ICD-9-CM and 32 for CCS. ICD-9-CM codes were typically too detailed (requiring custom groupings) while CCS codes were often not granular enough. Among 440 tested known SNP-phenotype associations, use of phecodes replicated 153 SNP-phenotype pairs compared to 143 for ICD-9-CM and 139 for CCS. Phecodes also generally produced stronger odds ratios and lower p-values for known associations than ICD-9-CM and CCS. Finally, evaluation of several SNPs via PheWAS identified novel potential signals, some seen in only using the phecode approach. Among them, rs7318369 in PEPD was associated with gastrointestinal hemorrhage.Our results suggest that the phecode groupings better align with clinical diseases mentioned in clinical practice or for genomic studies. ICD-9-CM, CCS, and phecode groupings all worked for PheWAS-type studies, though the phecode groupings produced superior results.
Project description:Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.
Project description:Because transethnic analysis may facilitate prioritization of causal genetic variants, we performed a genomewide association study (GWAS) of psoriasis in South Asians (SAS), consisting of 2,590 cases and 1,720 controls. Comparison with our existing European-origin (EUR) GWAS showed that effect sizes of known psoriasis signals were highly correlated in SAS and EUR (Spearman ρ = 0.78; <i>p</i> < 2 × 10<sup>-14</sup>). Transethnic meta-analysis identified two non-MHC psoriasis loci (1p36.22 and 1q24.2) not previously identified in EUR, which may have regulatory roles. For these two loci, the transethnic GWAS provided higher genetic resolution and reduced the number of potential causal variants compared to using the EUR sample alone. We then explored multiple strategies to develop reference panels for accurately imputing MHC genotypes in both SAS and EUR populations and conducted a fine-mapping of MHC psoriasis associations in SAS and the largest such effort for EUR. <i>HLA-C*06</i> was the top-ranking MHC locus in both populations but was even more prominent in SAS based on odds ratio, disease liability, model fit and predictive power. Transethnic modeling also substantially boosted the probability that the <i>HLA-C*06</i> protein variant is causal. Secondary MHC signals included coding variants of <i>HLA-C</i> and <i>HLA-B</i>, but also potential regulatory variants of these two genes as well as <i>HLA-A</i> and several HLA class II genes, with effects on both chromatin accessibility and gene expression. This study highlights the shared genetic basis of psoriasis in SAS and EUR populations and the value of transethnic meta-analysis for discovery and fine-mapping of susceptibility loci.
Project description:We conducted an electronic health record (EHR)-based phenome-wide association study (PheWAS) to discover pleiotropic effects of variants in three lipoprotein metabolism genes PCSK9, APOB, and LDLR. Using high-density genotype data, we tested the associations of variants in the three genes with 1232 EHR-derived binary phecodes in 51,700 European-ancestry (EA) individuals and 585 phecodes in 10,276 African-ancestry (AA) individuals; 457 PCSK9, 730 APOB, and 720 LDLR variants were filtered by imputation quality (r 2?>?0.4), minor allele frequency (>1%), linkage disequilibrium (r 2?<?0.3), and association with LDL-C levels, yielding a set of two PCSK9, three APOB, and five LDLR variants in EA but no variants in AA. Cases and controls were defined for each phecode using the PheWAS package in R. Logistic regression assuming an additive genetic model was used with adjustment for age, sex, and the first two principal components. Significant associations were tested in additional cohorts from Vanderbilt University (n?=?29,713), the Marshfield Clinic Personalized Medicine Research Project (n?=?9562), and UK Biobank (n?=?408,455). We identified one PCSK9, two APOB, and two LDLR variants significantly associated with an examined phecode. Only one of the variants was associated with a non-lipid disease phecode, ("myopia") but this association was not significant in the replication cohorts. In this large-scale PheWAS we did not find LDL-C-related variants in PCSK9, APOB, and LDLR to be associated with non-lipid-related phenotypes including diabetes, neurocognitive disorders, or cataracts.
Project description:Schizophrenia is a common polygenetic disease affecting 0.5-1% of individuals across distinct ethnic populations. PGC-II, the largest genome-wide association study investigating genetic risk factors for schizophrenia, previously identified 128 independent schizophrenia-associated genetic variants (GVs). The current study examined the genetic variability of GVs across ethnic populations. To assess the genetic variability across populations, the 'variability indices' (VIs) of the 128 schizophrenia-associated GVs were calculated. We used 2504 genomes from the 1000 Genomes Project taken from 26 worldwide healthy samples comprising five major ethnicities: East Asian (EAS: n=504), European (EUR: n=503), African (AFR: n=661), American (AMR: n=347) and South Asian (SAS: n=489). The GV with the lowest variability was rs36068923 (VI=1.07). The minor allele frequencies (MAFs) were 0.189, 0.192, 0.256, 0.183 and 0.194 for EAS, EUR, AFR, AMR and SAS, respectively. The GV with the highest variability was rs7432375 (VI=9.46). The MAFs were 0.791, 0.435, 0.041, 0.594 and 0.508 for EAS, EUR, AFR, AMR and SAS, respectively. When we focused on the EAS and EUR population, the allele frequencies of 86 GVs significantly differed between the EAS and EUR (P<3.91 × 10-4). The GV with the highest variability was rs4330281 (P=1.55 × 10-138). The MAFs were 0.023 and 0.519 for the EAS and EUR, respectively. The GV with the lowest variability was rs2332700 (P=9.80 × 10-1). The MAFs were similar between these populations (that is, 0.246 and 0.247 for the EAS and EUR, respectively). Interestingly, the mean allele frequencies of the GVs did not significantly differ between these populations (P>0.05). Although genetic heterogeneities were observed in the schizophrenia-associated GVs across ethnic groups, the combination of these GVs might increase the risk of schizophrenia.
Project description:BACKGROUND:Given the scarcity of cell lines from underrepresented populations, it is imperative that genetic ancestry for these cell lines is characterized. Consequences of cell line mischaracterization include squandered resources and publication retractions. METHODS:We calculated genetic ancestry proportions for 15 cell lines to assess the accuracy of previous race/ethnicity classification and determine previously unknown estimates. DNA was extracted from cell lines and genotyped for ancestry informative markers representing West African (WA), Native American (NA), and European (EUR) ancestry. RESULTS:Of the cell lines tested, all previously classified as White/Caucasian were accurately described with mean EUR ancestry proportions of 97%. Cell lines previously classified as Black/African American were not always accurately described. For instance, the 22Rv1 prostate cancer cell line was recently found to carry mixed genetic ancestry using a much smaller panel of markers. However, our more comprehensive analysis determined the 22Rv1 cell line carries 99% EUR ancestry. Most notably, the E006AA-hT prostate cancer cell line, classified as African American, was found to carry 92% EUR ancestry. We also determined the MDA-MB-468 breast cancer cell line carries 23% NA ancestry, suggesting possible Afro-Hispanic/Latina ancestry. CONCLUSIONS:Our results suggest predominantly EUR ancestry for the White/Caucasian-designated cell lines, yet high variance in ancestry for the Black/African American-designated cell lines. In addition, we revealed an extreme misclassification of the E006AA-hT cell line. IMPACT:Genetic ancestry estimates offer more sophisticated characterization leading to better contextualization of findings. Ancestry estimates should be provided for all cell lines to avoid erroneous conclusions in disparities literature.
Project description:<h4>Objective</h4>Our aim was to examine the impact of gestational diabetes (GDM), from before the GDM-diagnosis is made, on fetal growth trajectories, and to compare it in Europeans and South Asians; two ethnic groups with dissimilar fetal growth patterns.<h4>Methods</h4>We studied European (n = 349) and South Asian (n = 184) pregnant women, from the population-based STORK-Groruddalen cohort in Oslo, Norway. Mothers were enrolled in early pregnancy, screened for GDM in gestational week 28 ±2, and classified as "non-GDM", "mild GDM" or "moderate/severe GDM". We measured fetal head circumference, abdominal circumference and femur length by ultrasound, and estimated fetal weight in gestational week 24, 32 and 37, and performed corresponding measurements at birth.<h4>Results</h4>In non-GDM pregnancies, South Asian fetuses (n = 156) had a slower growth from gestational week 24, compared with Europeans (n = 310). More than two thirds of the European mothers later diagnosed with GDM were overweight or obese in early pregnancy, while this was not observed in South Asians. Fetuses of GDM mothers tended to be smaller than fetuses of non-GDM mothers in week 24, but thereafter grew faster until birth. This pattern was especially pronounced in fetuses of South Asian mothers with moderate/severe GDM. In week 24 these fetuses had a -0.95 SD (95% CI: -1.53, -0.36) lower estimated fetal weight than their non-GDM counterparts. In contrast, at birth they were 0.45 SD (0.09, 0.81) larger.<h4>Conclusions</h4>Offspring of GDM mothers were smaller in mid pregnancy, but subsequently grew faster until birth, compared with offspring of non-GDM mothers. This pattern was most prominent in South Asian mothers with moderate to severe GDM. However, the most remarkable characteristic of these fetuses was not a large size at birth, but the small size in mid pregnancy, before the GDM diagnosis was set.
Project description:Here we investigated the degree by which epigenetic signatures in children from mothers with obesity or gestational diabetes mellitus are influenced by environmental factors. We profiled the DNA methylation signature of whole blood from lean, obese and gestational diabetes mellitus mothers and their respective newborns. DNA methylation profiles of mothers showed high similarity across groups, while on the contrary, newborns from GDM mothers showed a marked distinct epigenetic profile compared to newborns of both lean and obese mothers. Analysis of variance in DNA methylation levels between newborns showed higher variance in the GDM group. Our work suggest that environmental factors, rather than direct transmission of epigenetic marks from the mother, are involved in establishing the epigenetic signature associated with GDM.
Project description:For admixture mapping studies in Mexican Americans (MAM), we define a genomewide single-nucleotide-polymorphism (SNP) panel that can distinguish between chromosomal segments of Amerindian (AMI) or European (EUR) ancestry. These studies used genotypes for >400,000 SNPs, defined in EUR and both Pima and Mayan AMI, to define a set of ancestry-informative markers (AIMs). The use of two AMI populations was necessary to remove a subset of SNPs that distinguished genotypes of only one AMI subgroup from EUR genotypes. The AIMs set contained 8,144 SNPs separated by a minimum of 50 kb with only three intermarker intervals >1 Mb and had EUR/AMI FST values >0.30 (mean FST = 0.48) and Mayan/Pima FST values <0.05 (mean FST < 0.01). Analysis of a subset of these SNP AIMs suggested that this panel may also distinguish ancestry between EUR and other disparate AMI groups, including Quechuan from South America. We show, using realistic simulation parameters that are based on our analyses of MAM genotyping results, that this panel of SNP AIMs provides good power for detecting disease-associated chromosomal segments for genes with modest ethnicity risk ratios. A reduced set of 5,287 SNP AIMs captured almost the same admixture mapping information, but smaller SNP sets showed substantial drop-off in admixture mapping information and power. The results will enable studies of type 2 diabetes, rheumatoid arthritis, and other diseases among which epidemiological studies suggest differences in the distribution of ancestry-associated susceptibility.