Summix: A method for detecting and adjusting for population structure in genetic summary data.
ABSTRACT: Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.
Project description:BACKGROUND:Europeans and American Indians were major genetic ancestry of Hispanics in the U.S. These ancestral groups have markedly different incidence rates and outcomes in many types of cancers. Therefore, the genetic admixture may cause biased genetic association study with cancer susceptibility variants specifically in Hispanics. For example, the incidence rate of liver cancer has been shown with substantial disparity between Hispanic, Asian and non-Hispanic white populations. Currently, ancestry informative marker (AIM) panels have been widely utilized with up to a few hundred ancestry-informative single nucleotide polymorphisms (SNPs) to infer ancestry admixture. Notably, current available AIMs are predominantly located in intron and intergenic regions, while the whole exome sequencing (WES) protocols commonly used in translational research and clinical practice do not cover these markers. Thus, it remains challenging to accurately determine a patient's admixture proportion without additional DNA testing. RESULTS:In this study we designed an unique AIM panel that infers 3-way genetic admixture from three distinct and selective continental populations (African (AFR), European (EUR), and East Asian (EAS)) within evolutionarily conserved exonic regions. Initially, about 1 million exonic SNPs from selective three populations in the 1000 Genomes Project were trimmed by their linkage disequilibrium (LD), restricted to biallelic variants, and finally we optimized to an AIM panel with 250 SNP markers, or the UT-AIM250 panel, using their ancestral informativeness statistics. Comparing to published AIM panels, UT-AIM250 performed better accuracy when we tested with three ancestral populations (accuracy: 0.995?±?0.012 for AFR, 0.997?±?0.007 for EUR, and 0.994?±?0.012 for EAS). We further demonstrated the performance of the UT-AIM250 panel to admixed American (AMR) samples of the 1000 Genomes Project and obtained similar results (AFR, 0.085?±?0.098; EUR, 0.665?±?0.182; and EAS, 0.250?±?0.205) to previously published AIM panels (Phillips-AIM34: AFR, 0.096?±?0.127, EUR, 0.575?±?0.290, and EAS, 0.330?±?0.315; Wei-AIM278: AFR, 0.070?±?0.096, EUR, 0.537?±?0.267, and EAS, 0.393?±?0.300). Subsequently, we applied the UT-AIM250 panel to a clinical dataset of 26 self-reported Hispanic patients in South Texas with hepatocellular carcinoma (HCC). We estimated the admixture proportions using WES data of adjacent non-cancer liver tissues (AFR, 0.065?±?0.043; EUR, 0.594?±?0.150; and EAS, 0.341?±?0.160). Similar admixture proportions were identified from corresponding tumor tissues. In addition, we estimated admixture proportions of The Cancer Genome Atlas (TCGA) collection of hepatocellular carcinoma (TCGA-LIHC) samples (376 patients) using the UT-AIM250 panel. The panel obtained consistent admixture proportions from tumor and matched normal tissues, identified 3 possible incorrectly reported race/ethnicity, and/or provided race/ethnicity determination if necessary. CONCLUSIONS:Here we demonstrated the feasibility of using evolutionarily conserved exonic regions to infer admixture proportions and provided a robust and reliable control for sample collection or patient stratification for genetic analysis. R implementation of UT-AIM250 is available at https://github.com/chenlabgccri/UT-AIM250.
Project description:Schizophrenia is a common polygenetic disease affecting 0.5-1% of individuals across distinct ethnic populations. PGC-II, the largest genome-wide association study investigating genetic risk factors for schizophrenia, previously identified 128 independent schizophrenia-associated genetic variants (GVs). The current study examined the genetic variability of GVs across ethnic populations. To assess the genetic variability across populations, the 'variability indices' (VIs) of the 128 schizophrenia-associated GVs were calculated. We used 2504 genomes from the 1000 Genomes Project taken from 26 worldwide healthy samples comprising five major ethnicities: East Asian (EAS: n=504), European (EUR: n=503), African (AFR: n=661), American (AMR: n=347) and South Asian (SAS: n=489). The GV with the lowest variability was rs36068923 (VI=1.07). The minor allele frequencies (MAFs) were 0.189, 0.192, 0.256, 0.183 and 0.194 for EAS, EUR, AFR, AMR and SAS, respectively. The GV with the highest variability was rs7432375 (VI=9.46). The MAFs were 0.791, 0.435, 0.041, 0.594 and 0.508 for EAS, EUR, AFR, AMR and SAS, respectively. When we focused on the EAS and EUR population, the allele frequencies of 86 GVs significantly differed between the EAS and EUR (P<3.91 × 10-4). The GV with the highest variability was rs4330281 (P=1.55 × 10-138). The MAFs were 0.023 and 0.519 for the EAS and EUR, respectively. The GV with the lowest variability was rs2332700 (P=9.80 × 10-1). The MAFs were similar between these populations (that is, 0.246 and 0.247 for the EAS and EUR, respectively). Interestingly, the mean allele frequencies of the GVs did not significantly differ between these populations (P>0.05). Although genetic heterogeneities were observed in the schizophrenia-associated GVs across ethnic groups, the combination of these GVs might increase the risk of schizophrenia.
Project description:To assess the statistical significance of associations between variants and traits, genome-wide association studies (GWAS) should employ an appropriate threshold that accounts for the massive burden of multiple testing in the study. Although most studies in the current literature commonly set a genome-wide significance threshold at the level of P=5.0 × 10<sup>-8</sup>, the adequacy of this value for respective populations has not been fully investigated. To empirically estimate thresholds for different ancestral populations, we conducted GWAS simulations using the 1000 Genomes Phase 3 data set for Africans (AFR), Europeans (EUR), Admixed Americans (AMR), East Asians (EAS) and South Asians (SAS). The estimated empirical genome-wide significance thresholds were P<sub>sig</sub>=3.24 × 10<sup>-8</sup> (AFR), 9.26 × 10<sup>-8</sup> (EUR), 1.83 × 10<sup>-7</sup> (AMR), 1.61 × 10<sup>-7</sup> (EAS) and 9.46 × 10<sup>-8</sup> (SAS). We additionally conducted trans-ethnic meta-analyses across all populations (ALL) and all populations except for AFR (?AFR), which yielded P<sub>sig</sub>=3.25 × 10<sup>-8</sup> (ALL) and 4.20 × 10<sup>-8</sup> (?AFR). Our results indicate that the current threshold (P=5.0 × 10<sup>-8</sup>) is overly stringent for all ancestral populations except for Africans; however, we should employ a more stringent threshold when conducting a meta-analysis, regardless of the presence of African samples.
Project description:BACKGROUND:Understanding how biological factors contribute to prostate cancer (PCa) health disparities requires mechanistic functional analysis of specific genes or pathways in pre-clinical cellular and animal models of this malignancy. The 22Rv1 human prostatic carcinoma cell line was originally derived from the parental CWR22R cell line. Although 22Rv1 has been well characterized and used in numerous mechanistic studies, no racial identifier has ever been disclosed for this cell line. In accordance with the need for racial diversity in cancer biospecimens and recent guidelines by the NIH on authentication of key biological resources, we sought to determine the ancestry of 22RV1 and authenticate previously reported racial identifications for four other PCa cell lines. METHODS:We used 29 established Ancestry Informative Marker (AIM) single nucleotide polymorphisms (SNPs) to conduct DNA ancestry analysis and assign ancestral proportions to a panel of five PCa cell lines that included 22Rv1, PC3, DU145, MDA-PCa-2b, and RC-77T/E. RESULTS:We found that 22Rv1 carries mixed genetic ancestry. The main ancestry proportions for this cell line were 0.41 West African (AFR) and 0.42 European (EUR). In addition, we verified the previously reported racial identifications for PC3 (0.73 EUR), DU145 (0.63 EUR), MDA-PCa-2b (0.73 AFR), and RC-77T/E (0.74 AFR) cell lines. CONCLUSIONS:Considering the mortality disparities associated with PCa, which disproportionately affect African American men, there remains a burden on the scientific community to diversify the availability of biospecimens, including cell lines, for mechanistic studies on potential biological mediators of these disparities. This study is beneficial by identifying another PCa cell line that carries substantial AFR ancestry. This finding may also open the door to new perspectives on previously published studies using this cell line.
Project description:Polymorphisms in genes related to the metabolism of vitamin B12 haven't been examined in a Brazilian population. To (a) determine the correlation between the local genetic ancestry components and vitamin B12 levels using ninety B12-related genes; (b) determine associations between these genes and their SNPs with vitamin B12 levels; (c) determine a polygenic risk score (PRS) using significant variants. This cross-sectional study included 168 children and adolescents, aged 9-13 years old. Total cobalamin was measured in plasma. Genotyping arrays and whole exome data were combined to yield ~ 7000 SNPs in 90 genes related to vitamin B12. The Efficient Local Ancestry Inference was used to estimate local ancestry for African (AFR), Native American, and European (EUR). The association between the genotypes and vitamin B12 levels were determined with generalized estimating equation. Vitamin B12 levels were driven by positive (EUR) and negative (AFR, AMR) correlations with genetic ancestry. A set of 36 variants were used to create a PRS that explained 42% of vitamin level variation. Vitamin B12 levels are influenced by genetic ancestry and a PRS explained almost 50% of the variation in plasma cobalamin in Brazilian children and adolescents.
Project description:To reveal genetic determinants of susceptibility to COVID-19 severity in the population and further explore potential immune-related factors, we performed a genome-wide association study on 284 confirmed COVID-19 patients (cases) and 95 healthy individuals (controls). We compared cases and controls of European (EUR) ancestry and African American (AFR) ancestry separately. To further exploring the linkage between HLA and COVID-19 severity, we applied fine-mapping analysis to dissect the HLA association with mild and severe cases.
Project description:Our knowledge of prostate cancer (PCa) genomics mainly reflects European (EUR) and Asian (ASN) populations. Our understanding of the influence of Middle Eastern (ME) and African (AFR) ancestry on the mutational profiles of prostate cancer is limited. To characterize genomic differences between ME, EUR, ASN, and AFR ancestry, fluorescent in situ hybridization (FISH) studies for <i>NKX3-1</i> deletion and MYC amplification were carried out on 42 tumors arising in individuals of ME ancestry. These were supplemented by analysis of genome-wide copy number profiles of 401 tumors of all ancestries. FISH results of <i>NKX3-1</i> and <i>MYC</i> were assessed in the ME cohort and compared to other ancestries. Gene level copy number aberrations (CNAs) for each sample were statistically compared between ancestry groups. <i>NKX3</i>-1 deletions by FISH were observed in 17/42 (17.5%) prostate tumors arising in men of ME ancestry, while <i>MYC</i> amplifications were only observed in 1/42 (2.3%). Using CNAs called from arrays, the incidence of <i>NKX3-1</i> deletions was significantly lower in ME vs. other ancestries (20% vs. 52%; <i>p</i> = 2.3 × 10<sup>-3</sup>). Across the genome, tumors arising in men of ME ancestry had fewer CNAs than those in men of other ancestries (<i>p</i> = 0.014). Additionally, the somatic amplification of 21 specific genes was more frequent in tumors arising in men of ME vs. EUR ancestry (two-sided proportion test; Q < 0.05). Those included amplifications in the glutathione S-transferase family on chromosome 1 (<i>GSTM1</i>, <i>GSTM2</i>, <i>GSTM5</i>) and the IQ motif-containing family on chromosome 3 (<i>IQCF1</i>, <i>IQCF2</i>, <i>IQCF13</i>, <i>IQCF4</i>, <i>IQCF5</i>, <i>IQCF6</i>). Larger studies investigating ME populations are warranted to confirm these observations.
Project description:Genome-wide association studies (GWAS) in samples of European ancestry have identified thousands of genetic variants associated with complex traits in humans. However, it remains largely unclear whether these associations can be used in non-European populations. Here, we seek to quantify the proportion of genetic variation for a complex trait shared between continental populations. We estimated the between-population correlation of genetic effects at all SNPs ([Formula: see text]) or genome-wide significant SNPs ([Formula: see text]) for height and body mass index (BMI) in samples of European (EUR; [Formula: see text]) and African (AFR; [Formula: see text]) ancestry. The [Formula: see text] between EUR and AFR was 0.75 ([Formula: see text]) for height and 0.68 ([Formula: see text]) for BMI, and the corresponding [Formula: see text] was 0.82 ([Formula: see text]) for height and 0.87 ([Formula: see text]) for BMI, suggesting that a large proportion of GWAS findings discovered in Europeans are likely applicable to non-Europeans for height and BMI. There was no evidence that [Formula: see text] differs in SNP groups with different levels of between-population difference in allele frequency or linkage disequilibrium, which, however, can be due to the lack of power.
Project description:The variables such as race, skin colour and ethnicity have become intensely discussed in medicine research, as a response to the rising debate over the importance of the ethnic-racial dimension in the scope of health-disease processes. The aim of this study was to identify the European (EUR), African (AFR) and Amerindian (AMR) ancestries on Brazilian health outcomes through a systematic literature review. This study was carried out by searching in three electronic databases, for studies published between 2005 and 2017. A total of 13 papers were eligible. The search identified the following health outcomes: visceral leishmaniosis, malaria, Alzheimer's disease, neuromyelitis optica, multiple sclerosis, prostate cancer, non-syndromic cleft lip/palate, chronic heart failure, sickle cell disease, primary congenital glaucoma, preterm labour, preterm premature rupture of membranes, systemic lupus erythematosus and type 1 diabetes mellitus. Research paper assessments were guided by the STROBE instrument, and agreements between results were determined by comparing the points attributed by two authors. Increased EUR ancestry was identified from preterm labour (PTL), type 1 diabetes (T1D) and non-syndromic cleft lip with or without cleft palate (NSCL), as well as in patients presenting aggressive prostate cancer prognoses. On the other hand, the highest AFR ancestral component was verified from systemic lupus erythematosus (SLE) and primary congenital glaucoma (PCG) cases, presenting worse prognoses. AMR ancestry may be a protective factor in the development of Alzheimer's disease (AD). The worst hemodynamic parameters in cases of heart failure (HF) were identified among individuals with greater AMR and AFR ancestry indices.
Project description:Following up on our previous study, we conducted a genome-wide analysis of admixture for two Uyghur population samples (HGDP-UG and PanAsia-UG), collected from the northern and southern regions of Xinjiang in China, respectively. Both HGDP-UG and PanAsia-UG showed a substantial admixture of East-Asian (EAS) and European (EUR) ancestries, with an empirical estimation of ancestry contribution of 53:47 (EAS:EUR) and 48:52 for HGDP-UG and PanAsia-UG, respectively. The effective admixture time under a model with a single pulse of admixture was estimated as 110 generations and 129 generations, or admixture events occurred about 2200 and 2580 years ago for HGDP-UG and PanAsia-UG, respectively, assuming an average of 20 yr per generation. Despite Uyghurs' earlier history compared to other admixture populations, admixture mapping, holds promise for this population, because of its large size and its mixture of ancestry from different continents. We screened multiple databases and identified a genome-wide single-nucleotide polymorphism panel that can distinguish EAS and EUR ancestry of chromosomal segments in Uyghurs. The panel contains 8150 ancestry-informative markers (AIMs) showing large frequency differences between EAS and EUR populations (F(ST) > 0.25, mean F(ST) = 0.43) but small frequency differences (7999 AIMs validated) within both populations (F(ST) < 0.05, mean F(ST) < 0.01). We evaluated the effectiveness of this admixture map for localizing disease genes in two Uyghur populations. To our knowledge, our map constitutes the first practical resource for admixture mapping in Uyghurs, and it will enable studies of diseases showing differences in genetic risk between EUR and EAS populations.