Project description:As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ~313 genes per genome, and ~95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits.
Project description:Parkinson's disease (PD) is an incurable, progressive and common movement disorder that is increasing in incidence globally because of population aging. We hypothesized that the landscape of rare, protein-altering variants could provide further insights into disease pathogenesis. Here we performed whole-exome sequencing followed by gene-based tests on 4,298 PD cases and 5,512 controls of Asian ancestry. We showed that GBA1 and SMPD1 were significantly associated with PD risk, with replication in a further 5,585 PD cases and 5,642 controls. We further refined variant classification using in vitro assays and showed that SMPD1 variants with reduced enzymatic activity display the strongest association (<44% activity, odds ratio (OR) = 2.24, P = 1.25 × 10-15) with PD risk. Moreover, 80.5% of SMPD1 carriers harbored the Asian-specific p.Pro332Arg variant (OR = 2.16; P = 4.47 × 10-8). Our findings highlight the utility of performing exome sequencing in diverse ancestry groups to identify rare protein-altering variants in genes previously unassociated with disease.
Project description:HDL-associated paraoxonase-1 (PON1) is an enzyme whose activity is associated with cerebrovascular disease. Common PON1 genetic variants have not been consistently associated with cerebrovascular disease. Rare coding variation that likely alters PON1 enzyme function may be more strongly associated with stroke. The National Heart, Lung, and Blood Institute Exome Sequencing Project sequenced the coding regions (exomes) of the genome for heart, lung, and blood-related phenotypes (including ischemic stroke). In this sample of 4,204 unrelated participants, 496 had verified, noncardioembolic ischemic stroke. After filtering, 28 nonsynonymous PON1 variants were identified. Analysis with the sequence kernel association test, adjusted for covariates, identified significant associations between PON1 variants and ischemic stroke (P = 3.01 × 10(-3)). Stratified analyses demonstrated a stronger association of PON1 variants with ischemic stroke in African ancestry (AA) participants (P = 5.03 × 10(-3)). Ethnic differences in the association between PON1 variants with stroke could be due to the effects of PON1Val109Ile (overall P = 7.88 × 10(-3); AA P = 6.52 × 10(-4)), found at higher frequency in AA participants (1.16% vs. 0.02%) and whose protein is less stable than the common allele. In summary, rare genetic variation in PON1 was associated with ischemic stroke, with stronger associations identified in those of AA. Increased focus on PON1 enzyme function and its role in cerebrovascular disease is warranted.
Project description:ObjectiveWe aimed to identify genes associated with genetic generalized epilepsy (GGE) by combining large cohorts enriched with individuals with a positive family history. Secondarily, we set out to compare the association of genes independently with familial and sporadic GGE.MethodsWe performed a case-control whole exome sequencing study in unrelated individuals of European descent diagnosed with GGE (previously recruited and sequenced through multiple international collaborations) and ancestry-matched controls. The association of ultra-rare variants (URVs; in 18 834 protein-coding genes) with epilepsy was examined in 1928 individuals with GGE (vs. 8578 controls), then separately in 945 individuals with familial GGE (vs. 8626 controls), and finally in 1005 individuals with sporadic GGE (vs. 8621 controls). We additionally examined the association of URVs with familial and sporadic GGE in two gene sets important for inhibitory signaling (19 genes encoding γ-aminobutyric acid type A [GABAA ] receptors, 113 genes representing the GABAergic pathway).ResultsGABRG2 was associated with GGE (p = 1.8 × 10-5 ), approaching study-wide significance in familial GGE (p = 3.0 × 10-6 ), whereas no gene approached a significant association with sporadic GGE. Deleterious URVs in the most intolerant subgenic regions in genes encoding GABAA receptors were associated with familial GGE (odds ratio [OR] = 3.9, 95% confidence interval [CI] = 1.9-7.8, false discovery rate [FDR]-adjusted p = .0024), whereas their association with sporadic GGE had marginally lower odds (OR = 3.1, 95% CI = 1.3-6.7, FDR-adjusted p = .022). URVs in GABAergic pathway genes were associated with familial GGE (OR = 1.8, 95% CI = 1.3-2.5, FDR-adjusted p = .0024) but not with sporadic GGE (OR = 1.3, 95% CI = .9-1.9, FDR-adjusted p = .19).SignificanceURVs in GABRG2 are likely an important risk factor for familial GGE. The association of gene sets of GABAergic signaling with familial GGE is more prominent than with sporadic GGE.
Project description:Pancreatic cancer is a deadly disease that accounts for approximately 5% of cancer deaths worldwide, with a dismal 5-year survival rate of 10%. Known genetic risk factors explain only a modest proportion of the heritable risk of pancreatic cancer. We conducted a whole-exome case-control sequencing study in 1,591 pancreatic cancer cases and 2,134 cancer-free controls of European ancestry. In our gene-based analysis, ATM ranked first, with a genome-wide significant p value of 1 × 10-8. The odds ratio for protein-truncating variants in ATM was 24, which is substantially higher than prior estimates, although ours includes a broad 95% confidence interval (4.0-1000). SIK3 was the second highest ranking gene (p = 3.84 × 10-6, false discovery rate or FDR = 0.032). We observed nominally significant association signals in several genes of a priori interest, including BRCA2 (p = 4.3 × 10-4), STK11 (p = 0.003), PALB2 (p = 0.019), and TP53 (p = 0.037), and reported risk estimates for known pathogenic variants and variants of uncertain significance (VUS) in these genes. The rare variants in established susceptibility genes explain approximately 24% of log familial relative risk, which is comparable to the contribution from established common susceptibility variants (17%). In conclusion, this study provides new insights into the genetic susceptibility of pancreatic cancer, refining rare variant risk estimates in known pancreatic cancer susceptibility genes and identifying SIK3 as a novel candidate susceptibility gene. This study highlights the prominent importance of ATM truncating variants and the underappreciated role of VUS in pancreatic cancer etiology.
Project description:Kawasaki disease (KD) is an acute pediatric vasculitis that affects genetically susceptible infants and children. To identify coding variants that influence susceptibility to KD, we conducted whole exome sequencing of 159 patients with KD and 902 controls, and performed a replication study in an independent 586 cases and 732 controls. We identified five rare coding variants in five genes (FCRLA, PTGER4, IL17F, CARD11, and SIGLEC10) associated with KD (odds ratio [OR], 1.18 to 4.41; p = 0.0027-0.031). We also performed association analysis in 26 KD patients with coronary artery aneurysms (CAAs; diameter > 5 mm) and 124 patients without CAAs (diameter < 3 mm), and identified another five rare coding variants in five genes (FGFR4, IL31RA, FNDC1, MMP8, and FOXN1), which may be associated with CAA (OR, 3.89 to 37.3; p = 0.0058-0.0261). These results provide insights into new candidate genes and genetic variants potentially involved in the development of KD and CAA.
Project description:Genome-wide association studies (GWAS) have identified 52 independent variants at 34 genetic loci that are associated with age-related macular degeneration (AMD), the most common cause of incurable vision loss in the elderly worldwide. However, causal genes at the majority of these loci remain unknown. In this study, we performed whole exome sequencing of 264 individuals from 63 multiplex families with AMD and analyzed the data for rare protein-altering variants in candidate target genes at AMD-associated loci. Rare coding variants were identified in the CFH, PUS7, RXFP2, PHF12 and TACC2 genes in three or more families. In addition, we detected rare coding variants in the C9, SPEF2 and BCAR1 genes, which were previously suggested as likely causative genes at respective AMD susceptibility loci. Identification of rare variants in the CFH and C9 genes in our study validated previous reports of rare variants in complement pathway genes in AMD. We then extended our exome-wide analysis and identified rare protein-altering variants in 13 genes outside the AMD-GWAS loci in three or more families. Two of these genes, SCN10A and KIR2DL4, are of interest because variants in these genes also showed association with AMD in case-control cohorts, albeit not at the level of genome-wide significance. Our study presents the first large-scale, exome-wide analysis of rare variants in AMD. Further independent replications and molecular investigation of candidate target genes, reported here, would assist in gaining novel insights into mechanisms underlying AMD pathogenesis.
Project description:ObjectiveRelapsing polychondritis (RP) is a systemic inflammatory disease of unknown etiology. The study objective was to examine the contribution of rare genetic variations in RP.MethodsWe performed a case-control exome-wide rare variant association analysis including 66 unrelated European American RP cases and 2923 healthy controls. Gene-level collapsing analysis was performed using Firth's logistics regression. Pathway analysis was performed on an exploratory basis with three different methods: Gene Set Enrichment Analysis (GSEA), sequence kernel association test (SKAT) and higher criticism test. Plasma DCBLD2 levels were measured in patients with RP and healthy controls using enzyme-linked immunosorbent assay (ELISA).ResultsIn the collapsing analysis, RP was associated with higher burden of ultra-rare damaging variants in the DCBLD2 gene (7.6% vs 0.1%, unadjusted odds ratio = 79.8, p = 2.93 × 10-7). Patients with RP and ultra-rare damaging variants in DCBLD2 had a higher prevalence of cardiovascular manifestations. Plasma DCBLD2 protein levels were significantly higher in RP than healthy controls (5.9 vs 2.3, p < 0.001). Pathway analysis showed statistically significant enrichment of genes in the tumor necrosis factor (TNF) signaling pathway driven by rare damaging variants in RELB, RELA and REL using higher criticism test weighted by degree and eigenvector centrality.ConclusionsThis study identified specific rare variants in DCBLD2 as putative genetic risk factors for RP. Genetic variation within the TNF pathway is also potentially associated with development of RP. These findings should be validated in additional patients with RP and supported by future functional experiments.
Project description:Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.
Project description:There is a limited understanding about the impact of rare protein-truncating variants across multiple phenotypes. We explore the impact of this class of variants on 13 quantitative traits and 10 diseases using whole-exome sequencing data from 100,296 individuals. Protein-truncating variants in genes intolerant to this class of mutations increased risk of autism, schizophrenia, bipolar disorder, intellectual disability, and ADHD. In individuals without these disorders, there was an association with shorter height, lower education, increased hospitalization, and reduced age at enrollment. Gene sets implicated from GWASs did not show a significant protein-truncating variants burden beyond what was captured by established Mendelian genes. In conclusion, we provide a thorough investigation of the impact of rare deleterious coding variants on complex traits, suggesting widespread pleiotropic risk.