Empirical estimation of genome-wide significance thresholds based on the 1000 Genomes Project data set.
ABSTRACT: To assess the statistical significance of associations between variants and traits, genome-wide association studies (GWAS) should employ an appropriate threshold that accounts for the massive burden of multiple testing in the study. Although most studies in the current literature commonly set a genome-wide significance threshold at the level of P=5.0 × 10-8, the adequacy of this value for respective populations has not been fully investigated. To empirically estimate thresholds for different ancestral populations, we conducted GWAS simulations using the 1000 Genomes Phase 3 data set for Africans (AFR), Europeans (EUR), Admixed Americans (AMR), East Asians (EAS) and South Asians (SAS). The estimated empirical genome-wide significance thresholds were Psig=3.24 × 10-8 (AFR), 9.26 × 10-8 (EUR), 1.83 × 10-7 (AMR), 1.61 × 10-7 (EAS) and 9.46 × 10-8 (SAS). We additionally conducted trans-ethnic meta-analyses across all populations (ALL) and all populations except for AFR (?AFR), which yielded Psig=3.25 × 10-8 (ALL) and 4.20 × 10-8 (?AFR). Our results indicate that the current threshold (P=5.0 × 10-8) is overly stringent for all ancestral populations except for Africans; however, we should employ a more stringent threshold when conducting a meta-analysis, regardless of the presence of African samples.
Project description:Schizophrenia is a common polygenetic disease affecting 0.5-1% of individuals across distinct ethnic populations. PGC-II, the largest genome-wide association study investigating genetic risk factors for schizophrenia, previously identified 128 independent schizophrenia-associated genetic variants (GVs). The current study examined the genetic variability of GVs across ethnic populations. To assess the genetic variability across populations, the 'variability indices' (VIs) of the 128 schizophrenia-associated GVs were calculated. We used 2504 genomes from the 1000 Genomes Project taken from 26 worldwide healthy samples comprising five major ethnicities: East Asian (EAS: n=504), European (EUR: n=503), African (AFR: n=661), American (AMR: n=347) and South Asian (SAS: n=489). The GV with the lowest variability was rs36068923 (VI=1.07). The minor allele frequencies (MAFs) were 0.189, 0.192, 0.256, 0.183 and 0.194 for EAS, EUR, AFR, AMR and SAS, respectively. The GV with the highest variability was rs7432375 (VI=9.46). The MAFs were 0.791, 0.435, 0.041, 0.594 and 0.508 for EAS, EUR, AFR, AMR and SAS, respectively. When we focused on the EAS and EUR population, the allele frequencies of 86 GVs significantly differed between the EAS and EUR (P<3.91 × 10-4). The GV with the highest variability was rs4330281 (P=1.55 × 10-138). The MAFs were 0.023 and 0.519 for the EAS and EUR, respectively. The GV with the lowest variability was rs2332700 (P=9.80 × 10-1). The MAFs were similar between these populations (that is, 0.246 and 0.247 for the EAS and EUR, respectively). Interestingly, the mean allele frequencies of the GVs did not significantly differ between these populations (P>0.05). Although genetic heterogeneities were observed in the schizophrenia-associated GVs across ethnic groups, the combination of these GVs might increase the risk of schizophrenia.
Project description:BACKGROUND:Europeans and American Indians were major genetic ancestry of Hispanics in the U.S. These ancestral groups have markedly different incidence rates and outcomes in many types of cancers. Therefore, the genetic admixture may cause biased genetic association study with cancer susceptibility variants specifically in Hispanics. For example, the incidence rate of liver cancer has been shown with substantial disparity between Hispanic, Asian and non-Hispanic white populations. Currently, ancestry informative marker (AIM) panels have been widely utilized with up to a few hundred ancestry-informative single nucleotide polymorphisms (SNPs) to infer ancestry admixture. Notably, current available AIMs are predominantly located in intron and intergenic regions, while the whole exome sequencing (WES) protocols commonly used in translational research and clinical practice do not cover these markers. Thus, it remains challenging to accurately determine a patient's admixture proportion without additional DNA testing. RESULTS:In this study we designed an unique AIM panel that infers 3-way genetic admixture from three distinct and selective continental populations (African (AFR), European (EUR), and East Asian (EAS)) within evolutionarily conserved exonic regions. Initially, about 1 million exonic SNPs from selective three populations in the 1000 Genomes Project were trimmed by their linkage disequilibrium (LD), restricted to biallelic variants, and finally we optimized to an AIM panel with 250 SNP markers, or the UT-AIM250 panel, using their ancestral informativeness statistics. Comparing to published AIM panels, UT-AIM250 performed better accuracy when we tested with three ancestral populations (accuracy: 0.995?±?0.012 for AFR, 0.997?±?0.007 for EUR, and 0.994?±?0.012 for EAS). We further demonstrated the performance of the UT-AIM250 panel to admixed American (AMR) samples of the 1000 Genomes Project and obtained similar results (AFR, 0.085?±?0.098; EUR, 0.665?±?0.182; and EAS, 0.250?±?0.205) to previously published AIM panels (Phillips-AIM34: AFR, 0.096?±?0.127, EUR, 0.575?±?0.290, and EAS, 0.330?±?0.315; Wei-AIM278: AFR, 0.070?±?0.096, EUR, 0.537?±?0.267, and EAS, 0.393?±?0.300). Subsequently, we applied the UT-AIM250 panel to a clinical dataset of 26 self-reported Hispanic patients in South Texas with hepatocellular carcinoma (HCC). We estimated the admixture proportions using WES data of adjacent non-cancer liver tissues (AFR, 0.065?±?0.043; EUR, 0.594?±?0.150; and EAS, 0.341?±?0.160). Similar admixture proportions were identified from corresponding tumor tissues. In addition, we estimated admixture proportions of The Cancer Genome Atlas (TCGA) collection of hepatocellular carcinoma (TCGA-LIHC) samples (376 patients) using the UT-AIM250 panel. The panel obtained consistent admixture proportions from tumor and matched normal tissues, identified 3 possible incorrectly reported race/ethnicity, and/or provided race/ethnicity determination if necessary. CONCLUSIONS:Here we demonstrated the feasibility of using evolutionarily conserved exonic regions to infer admixture proportions and provided a robust and reliable control for sample collection or patient stratification for genetic analysis. R implementation of UT-AIM250 is available at https://github.com/chenlabgccri/UT-AIM250.
Project description:Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.
Project description:Contrasting the genetic diversity of the human X chromosome (X) and autosomes has facilitated understanding historical differences between males and females and the influence of natural selection. Previous studies based on smaller data sets have left questions regarding how empirical patterns extend to additional populations and which forces can explain them. Here, we address these questions by analyzing the ratio of X-to-autosomal (X/A) nucleotide diversity with the complete genomes of 569 females from 14 populations. Results show that X/A diversity is similar within each continental group but notably lower in European (EUR) and East Asian (ASN) populations than in African (AFR) populations. X/A diversity increases in all populations with increasing distance from genes, highlighting the stronger impact of diversity-reducing selection on X than on the autosomes. However, relative X/A diversity (between two populations) is invariant with distance from genes, suggesting that selection does not drive the relative reduction in X/A diversity in non-Africans (0.842 ± 0.012 for EUR-to-AFR and 0.820 ± 0.032 for ASN-to-AFR comparisons). Finally, an array of models with varying population bottlenecks, expansions, and migration from the latest studies of human demographic history account for about half of the observed reduction in relative X/A diversity from the expected value of 1. They predict values between 0.91 and 0.94 for EUR-to-AFR comparisons and between 0.91 and 0.92 for ASN-to-AFR comparisons. Further reductions can be predicted by more extreme demographic events in excess of those captured by the latest studies but, in the absence of these, also by historical sex-biased demographic events or other processes.
Project description:Zinc transporters play important roles in all eukaryotes by maintaining the rational zinc concentration in cells. However, the diversity of zinc transporter genes (ZTGs) remains poorly studied. Here, we investigated the genetic diversity of 24 human ZTGs based on the 1000 Genomes data. Some ZTGs show small population differences, such as SLC30A6 with a weighted-average FST (WA-FST = 0.015), while other ZTGs exhibit considerably large population differences, such as SLC30A9 (WA-FST = 0.284). Overall, ZTGs harbor many more highly population-differentiated variants compared with random genes. Intriguingly, we found that SLC30A9 was underlying natural selection in both East Asians (EAS) and Africans (AFR) but in different directions. Notably, a non-synonymous variant (rs1047626) in SLC30A9 is almost fixed with 96.4% A in EAS and 92% G in AFR, respectively. Consequently, there are two different functional haplotypes exhibiting dominant abundance in AFR and EAS, respectively. Furthermore, a strong correlation was observed between the haplotype frequencies of SLC30A9 and distributions of zinc contents in soils or crops. We speculate that the genetic differentiation of ZTGs could directly contribute to population heterogeneity in zinc transporting capabilities and local adaptations of human populations in regard to the local zinc state or diets, which have both evolutionary and medical implications.
Project description:Previous genome-wide association studies (GWASs) have been largely focused on European (EUR) populations. However, polygenic risk scores (PRSs) derived from EUR have been shown to perform worse in non-EURs compared with EURs. In this study, we aim to improve PRS prediction in East Asians (EASs). We introduce a rescaled meta-analysis framework to combine both EUR (N = 122,175) and EAS (N = 30,801) GWAS summary statistics. To improve PRS prediction in EASs, we use a scaling factor to up-weight the EAS data, such that the resulting effect size estimates are more relevant to EASs. We then derive PRSs for EAS from the rescaled meta-analysis results of EAS and EUR data. Evaluated in an independent EAS validation data set, this approach increases the prediction liability-adjusted Nagelkerke's pseudo R<sup>2</sup> by 40%, 41%, and 5%, respectively, compared with PRSs derived from an EAS GWAS only, EUR GWAS only, and conventional fixed-effects meta-analysis of EAS and EUR data. The PRS derived from the rescaled meta-analysis approach achieved an area under the receiver operating characteristic curve (AUC) of 0.6059, higher than AUC = 0.5782, 0.5809, 0.6008 for EAS, EUR, and conventional meta-analysis of EAS and EUR. We further compare PRSs constructed by single-nucleotide polymorphisms that have different linkage disequilibrium (LD) scores and minor allele frequencies (MAFs) between EUR and EAS, and observe that lower LD scores or MAF in EAS correspond to poorer PRS performance (AUC = 0.5677, 0.5530, respectively) than higher LD scores or MAF (AUC = 0.589, 0.5993, respectively). We finally build a PRS stratified by LD score differences in EUR and EAS using rescaled meta-analysis, and obtain an AUC of 0.6096, with improvement over other strategies investigated.
Project description:To provide insights into the biology of opioid dependence (OD) and opioid use (i.e., exposure, OE), we completed a genome-wide analysis comparing 4503 OD cases, 4173 opioid-exposed controls, and 32,500 opioid-unexposed controls, including participants of European and African descent (EUR and AFR, respectively). Among the variants identified, rs9291211 was associated with OE (exposed vs. unexposed controls; EUR z = -5.39, p = 7.2 × 10<sup>-8</sup>). This variant regulates the transcriptomic profiles of SLC30A9 and BEND4 in multiple brain tissues and was previously associated with depression, alcohol consumption, and neuroticism. A phenome-wide scan of rs9291211 in the UK Biobank (N > 360,000) found association of this variant with propensity to use dietary supplements (p = 1.68 × 10<sup>-8</sup>). With respect to the same OE phenotype in the gene-based analysis, we identified SDCCAG8 (EUR + AFR z = 4.69, p = 10<sup>-6</sup>), which was previously associated with educational attainment, risk-taking behaviors, and schizophrenia. In addition, rs201123820 showed a genome-wide significant difference between OD cases and unexposed controls (AFR z = 5.55, p = 2.9 × 10<sup>-8</sup>) and a significant association with musculoskeletal disorders in the UK Biobank (p = 4.88 × 10<sup>-7</sup>). A polygenic risk score (PRS) based on a GWAS of risk-tolerance (n = 466,571) was positively associated with OD (OD vs. unexposed controls, p = 8.1 × 10<sup>-5</sup>; OD cases vs. exposed controls, p = 0.054) and OE (exposed vs. unexposed controls, p = 3.6 × 10<sup>-5</sup>). A PRS based on a GWAS of neuroticism (n = 390,278) was positively associated with OD (OD vs. unexposed controls, p = 3.2 × 10<sup>-5</sup>; OD vs. exposed controls, p = 0.002) but not with OE (p = 0.67). Our analyses highlight the difference between dependence and exposure and the importance of considering the definition of controls in studies of addiction.
Project description:<h4>Background</h4>Over 200 schizophrenia risk loci have been identified by genome-wide association studies (GWASs). However, the majority of risk loci were identified in populations of European ancestry (EUR), potentially missing important biological insights. It is important to perform 5 GWASs in non-European populations.<h4>Methods</h4>To identify novel schizophrenia risk loci, we conducted a GWAS in Han Chinese population (3493 cases and 4709 controls). We then performed a large-scale meta-analysis (a total of 143,438 subjects) through combining our results with previous GWASs conducted in EAS and EUR. In addition, we also carried out comprehensive post-GWAS analysis, including heritability partitioning, enrichment of schizophrenia associations in tissues and cell types, trancscriptome-wide association study (TWAS), expression quantitative trait loci (eQTL) and differential expression analysis.<h4>Results</h4>We identified two new schizophrenia risk loci, including associations in SHISA9 (rs7192086, P = 4.92 × 10<sup>-08</sup>) and PES1 (rs57016637, P = 2.33 × 10<sup>-11</sup>) in Han Chinese population. A fixed-effect meta-analysis (a total of 143,438 subjects) with summary statistics from EAS and EUR identifies 15 novel genome-wide significant risk loci. Heritability partitioning with linkage disequilibrium score regression (LDSC) reveals a significant enrichment of schizophrenia heritability in conserved genomic regions, promoters, and enhancers. Tissue and cell-type enrichment analyses show that schizophrenia associations are significantly enriched in human brain tissues and several types of neurons, including cerebellum neurons, telencephalon inhibitory, and excitatory neurons. Polygenic risk score profiling reveals that GWAS summary statistics from trans-ancestry meta-analysis (EAS + EUR) improves prediction performance in predicting the case/control status of our sample. Finally, transcriptome-wide association study (TWAS) identifies risk genes whose cis-regulated expression change may have a role in schizophrenia.<h4>Conclusions</h4>Our study identifies 17 novel schizophrenia risk loci and highlights the importance and necessity of conducting genetic study in different populations. These findings not only provide new insights into genetic etiology of schizophrenia, but also facilitate to delineate the pathophysiology of schizophrenia and develop new therapeutic targets.
Project description:Recent human adaptations have shaped population differentiation in genomic regions containing putative functional variants, mostly located in predicted regulatory elements. However, their actual functionalities and the underlying mechanism of recent adaptation remain poorly understood. In the current study, regions of genes and repeats were investigated for functionality depending on the degree of population differentiation, FST or ?DAF (a difference in derived allele frequency). The high FST in the 5´ or 3´ untranslated regions (UTRs), in particular, confirmed that population differences arose mainly from differences in regulation. Expression quantitative trait loci (eQTL) analyses using lymphoblastoid cell lines indicated that the majority of the highly population-specific regions represented cis- and/or trans-eQTL. However, groups having the highest ?DAFs did not necessarily have higher proportions of eQTL variants; in these groups, the patterns were complex, indicating recent intricate adaptations. The results indicated that East Asian (EAS) and European populations (EUR) experienced mutual selection pressures. The mean derived allele frequency of the high ?DAF groups suggested that EAS and EUR underwent strong adaptation; however, the African population in Africa (AFR) experienced slight, yet broad, adaptation. The DAF distributions of variants in the gene regions showed clear selective pressure in each population, which implies the existence of more recent regulatory adaptations in cells other than lymphoblastoid cell lines. In-depth analysis of population-differentiated regions indicated that the coding gene, RNF135, represented a trans-regulation hotspot via cis-regulation by the population-specific variants in the region of selective sweep. Together, the results provide strong evidence of actual intricate adaptation of human populations via regulatory manipulation.
Project description:Genome-wide association studies (GWASs) have identified >100 susceptibility loci for schizophrenia (SCZ) and demonstrated that SCZ is a polygenic disorder determined by numerous genetic variants but with small effect size. We conducted a GWAS in the Japanese (JPN) population (a) to detect novel SCZ-susceptibility genes and (b) to examine the shared genetic risk of SCZ across (East Asian [EAS] and European [EUR]) populations and/or that of trans-diseases (SCZ, bipolar disorder [BD], and major depressive disorder [MDD]) within EAS and between EAS and EUR (trans-diseases/populations). Among the discovery GWAS subjects (JPN-SCZ GWAS: 1940 SCZ cases and 7408 controls) and replication dataset (4071 SCZ cases and 54479 controls), both comprising JPN populations, 3 novel susceptibility loci for SCZ were identified: SPHKAP (Pbest = 4.1 × 10-10), SLC38A3 (Pbest = 5.7 × 10-10), and CABP1-ACADS (Pbest = 9.8 × 10-9). Subsequent meta-analysis between our samples and those of the Psychiatric GWAS Consortium (PGC; EUR samples) and another study detected 12 additional susceptibility loci. Polygenic risk score (PRS) prediction revealed a shared genetic risk of SCZ across populations (Pbest = 4.0 × 10-11) and between SCZ and BD in the JPN population (P ~ 10-40); however, a lower variance-explained was noted between JPN-SCZ GWAS and PGC-BD or MDD within/across populations. Genetic correlation analysis supported the PRS results; the genetic correlation between JPN-SCZ and PGC-SCZ was ? = 0.58, whereas a similar/lower correlation was observed between the trans-diseases (JPN-SCZ vs JPN-BD/EAS-MDD, rg = 0.56/0.29) or trans-diseases/populations (JPN-SCZ vs PGC-BD/MDD, ? = 0.38/0.12). In conclusion, (a) Fifteen novel loci are possible susceptibility genes for SCZ and (b) SCZ "risk" effect is shared with other psychiatric disorders even across populations.