Variability of 128 schizophrenia-associated gene variants across distinct ethnic populations.
ABSTRACT: Schizophrenia is a common polygenetic disease affecting 0.5-1% of individuals across distinct ethnic populations. PGC-II, the largest genome-wide association study investigating genetic risk factors for schizophrenia, previously identified 128 independent schizophrenia-associated genetic variants (GVs). The current study examined the genetic variability of GVs across ethnic populations. To assess the genetic variability across populations, the 'variability indices' (VIs) of the 128 schizophrenia-associated GVs were calculated. We used 2504 genomes from the 1000 Genomes Project taken from 26 worldwide healthy samples comprising five major ethnicities: East Asian (EAS: n=504), European (EUR: n=503), African (AFR: n=661), American (AMR: n=347) and South Asian (SAS: n=489). The GV with the lowest variability was rs36068923 (VI=1.07). The minor allele frequencies (MAFs) were 0.189, 0.192, 0.256, 0.183 and 0.194 for EAS, EUR, AFR, AMR and SAS, respectively. The GV with the highest variability was rs7432375 (VI=9.46). The MAFs were 0.791, 0.435, 0.041, 0.594 and 0.508 for EAS, EUR, AFR, AMR and SAS, respectively. When we focused on the EAS and EUR population, the allele frequencies of 86 GVs significantly differed between the EAS and EUR (P<3.91 × 10-4). The GV with the highest variability was rs4330281 (P=1.55 × 10-138). The MAFs were 0.023 and 0.519 for the EAS and EUR, respectively. The GV with the lowest variability was rs2332700 (P=9.80 × 10-1). The MAFs were similar between these populations (that is, 0.246 and 0.247 for the EAS and EUR, respectively). Interestingly, the mean allele frequencies of the GVs did not significantly differ between these populations (P>0.05). Although genetic heterogeneities were observed in the schizophrenia-associated GVs across ethnic groups, the combination of these GVs might increase the risk of schizophrenia.
Project description:To assess the statistical significance of associations between variants and traits, genome-wide association studies (GWAS) should employ an appropriate threshold that accounts for the massive burden of multiple testing in the study. Although most studies in the current literature commonly set a genome-wide significance threshold at the level of P=5.0 × 10<sup>-8</sup>, the adequacy of this value for respective populations has not been fully investigated. To empirically estimate thresholds for different ancestral populations, we conducted GWAS simulations using the 1000 Genomes Phase 3 data set for Africans (AFR), Europeans (EUR), Admixed Americans (AMR), East Asians (EAS) and South Asians (SAS). The estimated empirical genome-wide significance thresholds were P<sub>sig</sub>=3.24 × 10<sup>-8</sup> (AFR), 9.26 × 10<sup>-8</sup> (EUR), 1.83 × 10<sup>-7</sup> (AMR), 1.61 × 10<sup>-7</sup> (EAS) and 9.46 × 10<sup>-8</sup> (SAS). We additionally conducted trans-ethnic meta-analyses across all populations (ALL) and all populations except for AFR (?AFR), which yielded P<sub>sig</sub>=3.25 × 10<sup>-8</sup> (ALL) and 4.20 × 10<sup>-8</sup> (?AFR). Our results indicate that the current threshold (P=5.0 × 10<sup>-8</sup>) is overly stringent for all ancestral populations except for Africans; however, we should employ a more stringent threshold when conducting a meta-analysis, regardless of the presence of African samples.
Project description:BACKGROUND:Europeans and American Indians were major genetic ancestry of Hispanics in the U.S. These ancestral groups have markedly different incidence rates and outcomes in many types of cancers. Therefore, the genetic admixture may cause biased genetic association study with cancer susceptibility variants specifically in Hispanics. For example, the incidence rate of liver cancer has been shown with substantial disparity between Hispanic, Asian and non-Hispanic white populations. Currently, ancestry informative marker (AIM) panels have been widely utilized with up to a few hundred ancestry-informative single nucleotide polymorphisms (SNPs) to infer ancestry admixture. Notably, current available AIMs are predominantly located in intron and intergenic regions, while the whole exome sequencing (WES) protocols commonly used in translational research and clinical practice do not cover these markers. Thus, it remains challenging to accurately determine a patient's admixture proportion without additional DNA testing. RESULTS:In this study we designed an unique AIM panel that infers 3-way genetic admixture from three distinct and selective continental populations (African (AFR), European (EUR), and East Asian (EAS)) within evolutionarily conserved exonic regions. Initially, about 1 million exonic SNPs from selective three populations in the 1000 Genomes Project were trimmed by their linkage disequilibrium (LD), restricted to biallelic variants, and finally we optimized to an AIM panel with 250 SNP markers, or the UT-AIM250 panel, using their ancestral informativeness statistics. Comparing to published AIM panels, UT-AIM250 performed better accuracy when we tested with three ancestral populations (accuracy: 0.995?±?0.012 for AFR, 0.997?±?0.007 for EUR, and 0.994?±?0.012 for EAS). We further demonstrated the performance of the UT-AIM250 panel to admixed American (AMR) samples of the 1000 Genomes Project and obtained similar results (AFR, 0.085?±?0.098; EUR, 0.665?±?0.182; and EAS, 0.250?±?0.205) to previously published AIM panels (Phillips-AIM34: AFR, 0.096?±?0.127, EUR, 0.575?±?0.290, and EAS, 0.330?±?0.315; Wei-AIM278: AFR, 0.070?±?0.096, EUR, 0.537?±?0.267, and EAS, 0.393?±?0.300). Subsequently, we applied the UT-AIM250 panel to a clinical dataset of 26 self-reported Hispanic patients in South Texas with hepatocellular carcinoma (HCC). We estimated the admixture proportions using WES data of adjacent non-cancer liver tissues (AFR, 0.065?±?0.043; EUR, 0.594?±?0.150; and EAS, 0.341?±?0.160). Similar admixture proportions were identified from corresponding tumor tissues. In addition, we estimated admixture proportions of The Cancer Genome Atlas (TCGA) collection of hepatocellular carcinoma (TCGA-LIHC) samples (376 patients) using the UT-AIM250 panel. The panel obtained consistent admixture proportions from tumor and matched normal tissues, identified 3 possible incorrectly reported race/ethnicity, and/or provided race/ethnicity determination if necessary. CONCLUSIONS:Here we demonstrated the feasibility of using evolutionarily conserved exonic regions to infer admixture proportions and provided a robust and reliable control for sample collection or patient stratification for genetic analysis. R implementation of UT-AIM250 is available at https://github.com/chenlabgccri/UT-AIM250.
Project description:Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.
Project description:OBJECTIVES/SPECIFIC AIMS: This study aims to identify genetic biomarkers of GDM and facilitate the understanding of its molecular underpinnings. METHODS/STUDY POPULATION: We identified a cohort of mothers diagnosed with GDM in our longitudinal birth study by mining Electronic Health Records of participants utilizing PheCode map with ICD-9 and ICD-10 codes. We verified each case using ACOG’s GDM diagnosis criteria. RESULTS/ANTICIPATED RESULTS: Whole genome sequencing (WGS) data were available for 111 confirmed cases (out of 205) and 706 controls (out of 1,429) from different ancestries (412 EUR, 256 AMR, 56 EAS, 26 SAS and 18 AFR; 49 OTHER). SAS had the highest incidence of GDM at 38.46% and EUR had the lowest at 6.55%. We performed logistic regression using computed ancestry, age and BMI as covariates to determine if any variants are associated with GDM. The top variant (rs139014401) was found in an intron of DFFB gene, which is p53-bound and regulates DNA fragmentation during apoptosis. We will investigate the robustness of 49 identified variants and will separate the cohort by ancestry to detect population-specific differences in the top loci. DISCUSSION/SIGNIFICANCE OF IMPACT: Identification of molecular biomarkers in GDM across different ancestral backgrounds will address a gap in current GDM research. Findings may enhance screening and enable clinicians to identify those at risk for developing GDM earlier in the pregnancy. Early management of mothers at risk may lead to better health outcomes for mother and baby.
Project description:Genetic polymorphisms in cytochrome P450 genes can cause alteration in metabolic activity of clinically important medicines. Thus, single nucleotide variants (SNVs) and copy number variations (CNVs) in CYP genes are leading factors of drug pharmacokinetics and toxicity and form pharmacogenetics biomarkers for drug dosing, efficacy, and safety. The distribution of cytochrome P450 alleles differs significantly between populations with important implications for personalized drug therapy and healthcare programs. To provide a meta-analysis of CYP allele polymorphisms with clinical importance, we brought together whole-genome and exome sequencing data from 800 unrelated individuals of Iranian population (100 subjects from 8 major ethnics of Iran) and 63,269 unrelated individuals of five major human populations (EUR, AMR, AFR, EAS and SAS). By integrating these datasets with population-specific linkage information, we evolved the frequencies of 140 CYP haplotypes related to 9 important CYP450 isoenzymes (CYP1A2, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, CYP3A4 and CYP3A5) giving a large resource for major genetic determinants of drug metabolism. Furthermore, we evaluated the more frequent Iranian alleles and compared the dataset with the Caucasian race. Finally, the similarity of the Iranian population SNVs with other populations was investigated.
Project description:Previous genome-wide association studies (GWASs) have been largely focused on European (EUR) populations. However, polygenic risk scores (PRSs) derived from EUR have been shown to perform worse in non-EURs compared with EURs. In this study, we aim to improve PRS prediction in East Asians (EASs). We introduce a rescaled meta-analysis framework to combine both EUR (N = 122,175) and EAS (N = 30,801) GWAS summary statistics. To improve PRS prediction in EASs, we use a scaling factor to up-weight the EAS data, such that the resulting effect size estimates are more relevant to EASs. We then derive PRSs for EAS from the rescaled meta-analysis results of EAS and EUR data. Evaluated in an independent EAS validation data set, this approach increases the prediction liability-adjusted Nagelkerke's pseudo R<sup>2</sup> by 40%, 41%, and 5%, respectively, compared with PRSs derived from an EAS GWAS only, EUR GWAS only, and conventional fixed-effects meta-analysis of EAS and EUR data. The PRS derived from the rescaled meta-analysis approach achieved an area under the receiver operating characteristic curve (AUC) of 0.6059, higher than AUC = 0.5782, 0.5809, 0.6008 for EAS, EUR, and conventional meta-analysis of EAS and EUR. We further compare PRSs constructed by single-nucleotide polymorphisms that have different linkage disequilibrium (LD) scores and minor allele frequencies (MAFs) between EUR and EAS, and observe that lower LD scores or MAF in EAS correspond to poorer PRS performance (AUC = 0.5677, 0.5530, respectively) than higher LD scores or MAF (AUC = 0.589, 0.5993, respectively). We finally build a PRS stratified by LD score differences in EUR and EAS using rescaled meta-analysis, and obtain an AUC of 0.6096, with improvement over other strategies investigated.
Project description:<h4>Background</h4>We aimed to enrich the pharmacogenomic information of a Blang population (BP) from Yunnan Province in China.<h4>Methods</h4>We genotyped 55 very important pharmacogene (VIP) variants from the PharmGKB database and compared their genotype distribution (GD) in a BP with that of 26 populations by the <i>χ</i> <sup>2</sup> test. The minor allele frequency (MAF) distribution of seven significantly different single-nucleotide polymorphisms (SNPs) was conducted to compare the difference between the BP and 26 other populations.<h4>Results</h4>Compared with the GD of 55 loci in the BP, among 26 studied populations, GWD, YRI, GIH, ESN, MSL, TSI, PJL, ACB, FIN and IBS were the top-10 populations, which showed a significantly different GD >35 loci. CHB, JPT, CDX, CHS, and KHV populations had a significantly different GD <20 loci. A GD difference of 27-34 loci was found between the BP and 11 populations (LWK, CEU, ITU, STU, PUR, CLM, GBR, ASW, BEB, MXL and PEL). The GD of five loci (rs750155 (<i>SULT1A1</i>), rs4291 (<i>ACE</i>), rs1051298 (<i>SLC19A1</i>), rs1131596 (<i>SLC19A1</i>) and rs1051296 (<i>SLC19A1</i>)) were the most significantly different in the BP as compared with that of the other 26 populations. The genotype frequency of rs1800764 (<i>ACE</i>) and rs1065852 (<i>CYP2D6</i>) was different in all populations except for PEL and LWK, respectively. MAFs of rs1065852 (<i>CYP2D6</i>) and rs750155 (<i>SULT1A1</i>) showed the largest fluctuation between the BP and SAS, EUR, AFR and AMR populations.<h4>Conclusion</h4>Our data can provide theoretical guidance for safe and efficacious personalized drug use in the Blang population.
Project description:This synthetic dataset contains genetics data for 1,008,000 individuals and 9 continuous phenotypic traits with various genetic architectures. The dataset includes 6 ancestry groups (AFR, AMR, CSA, EAS, EUR, MID) and over 6.8 million single nucleotide polymorphisms (SNPs) across 22 chromosomes. The data was generated using the HAPNEST software program (https://github.com/intervene-EU-H2020/synthetic_data) developed by members of the INTERVENE consortium (https://www.interveneproject.eu/). This software has been specifically designed to enable efficient, large-scale synthetic data generation for common genetic variants and complex phenotypic traits. We have open sourced this software so that anyone can easily generate their own synthetic datasets. Please see the linked GitHub repository for further details. The reference dataset used to generate this synthetic dataset is the combined 1000 Genomes Project and Human Genomic Diversity Project datasets downloaded from https://gnomad.broadinstitute.org/downloads. The data was preprocessed by retaining SNPs with non-zero MAF in all populations for which rsID numbers could be successfully aligned. This resulted in over 6.8 million variants across 22 chromosomes.
Project description:The variables such as race, skin colour and ethnicity have become intensely discussed in medicine research, as a response to the rising debate over the importance of the ethnic-racial dimension in the scope of health-disease processes. The aim of this study was to identify the European (EUR), African (AFR) and Amerindian (AMR) ancestries on Brazilian health outcomes through a systematic literature review. This study was carried out by searching in three electronic databases, for studies published between 2005 and 2017. A total of 13 papers were eligible. The search identified the following health outcomes: visceral leishmaniosis, malaria, Alzheimer's disease, neuromyelitis optica, multiple sclerosis, prostate cancer, non-syndromic cleft lip/palate, chronic heart failure, sickle cell disease, primary congenital glaucoma, preterm labour, preterm premature rupture of membranes, systemic lupus erythematosus and type 1 diabetes mellitus. Research paper assessments were guided by the STROBE instrument, and agreements between results were determined by comparing the points attributed by two authors. Increased EUR ancestry was identified from preterm labour (PTL), type 1 diabetes (T1D) and non-syndromic cleft lip with or without cleft palate (NSCL), as well as in patients presenting aggressive prostate cancer prognoses. On the other hand, the highest AFR ancestral component was verified from systemic lupus erythematosus (SLE) and primary congenital glaucoma (PCG) cases, presenting worse prognoses. AMR ancestry may be a protective factor in the development of Alzheimer's disease (AD). The worst hemodynamic parameters in cases of heart failure (HF) were identified among individuals with greater AMR and AFR ancestry indices.
Project description:BACKGROUND:Pulmonary hypertension (PH) is a rare disease characterized by proliferation and occlusion of small pulmonary arterioles, which has been associated with a high mortality rate. The pathogenesis of PH is complex and incompletely understood, which includes both genetic and environmental factors that alter vascular structure and function. METHODS:Thus we aimed to reveal the potential genetic etiology of PH by targeting 143 tag SNPs of 14 candidate genes. Totally 208 individuals from Chinese Han population were enrolled in the present study, including 109 non-idiopathic PH patients and 99 healthy controls. RESULTS:The data revealed that 2 SNPs were associated with PH overall susceptibility at p?<?3×10-?4 after Bonferroni correction. The top hit was rs6557421 (p?=?4.5×10-?9), located within Nox3 gene on chromosome 6. Another SNP rs3744439 located in Tbx4 gene, also showed evidence of association with PH susceptibility (p?=?1.2×10-?6). The distribution of genotype frequencies of rs6557421 and rs3744439 have dramatic differences between PH patients and controls. Individuals with rs6557421 TT genotype had a 10.72-fold/14.20-fold increased risk to develop PH when compared with GG or GG/GT carriers in codominant or recessive model, respectively (TT versus GG: 95%CI?=?4.79-24.00; TT versus GG/GT: 95%CI?=?6.65-30.33). As for rs3744439, AG genotype only occurred in healthy controls but has not been observed in PH patients. We further validated the result by using 26 different populations from five regions around the globe, including African (AFR), American (AMR), East Asian (EAS), European (EUR), and South Asian (SAS). In consistent with the present case-control study's results, significantly different genotype frequencies of the observed SNPs existed between PH patients and healthy individuals from all over the world. CONCLUSIONS:The results suggested that rs6557421 variant in Nox3 and rs3744439 variant in Tbx4 might have potential effect on individual susceptibility to pulmonary hypertension, which could lead to therapeutic or diagnosis approaches in PH.