Project description:Nearly two hundred common-variant depression risk loci have been identified by genome-wide association studies (GWAS). However, the impact of rare coding variants on depression remains poorly understood. Here, we present whole-exome sequencing analyses of depression with seven different definitions based on survey, questionnaire, and electronic health records in 320,356 UK Biobank participants. We showed that the burden of rare damaging coding variants in loss-of-function intolerant genes is significantly associated with risk of depression with various definitions. We compared the rare and common genetic architecture across depression definitions by genetic correlation and showed different genetic relationships between definitions across common and rare variants. In addition, we demonstrated that the effects of rare damaging coding variant burden and polygenic risk score on depression risk are additive. The gene set burden analyses revealed overlapping rare genetic variant components with developmental disorder, autism, and schizophrenia. Our study provides insights into the contribution of rare coding variants, separately and in conjunction with common variants, on depression with various definitions and their genetic relationships with neurodevelopmental disorders.
Project description:Phasing involves distinguishing the two parentally inherited copies of each chromosome into haplotypes. Here, we introduce SHAPEIT5, a new phasing method that quickly and accurately processes large sequencing datasets and applied it to UK Biobank (UKB) whole-genome and whole-exome sequencing data. We demonstrate that SHAPEIT5 phases rare variants with low switch error rates of below 5% for variants present in just 1 sample out of 100,000. Furthermore, we outline a method for phasing singletons, which, although less precise, constitutes an important step towards future developments. We then demonstrate that the use of UKB as a reference panel improves the accuracy of genotype imputation, which is even more pronounced when phased with SHAPEIT5 compared with other methods. Finally, we screen the UKB data for loss-of-function compound heterozygous events and identify 549 genes where both gene copies are knocked out. These genes complement current knowledge of gene essentiality in the human genome.
Project description:Telomeres protect chromosome ends from damage and their length is linked with human disease and aging. We developed a joint telomere length metric, combining quantitative PCR and whole-genome sequencing measurements from 462,666 UK Biobank participants. This metric increased SNP heritability, suggesting that it better captures genetic regulation of telomere length. Exome-wide rare-variant and gene-level collapsing association studies identified 64 variants and 30 genes significantly associated with telomere length, including allelic series in ACD and RTEL1. Notably, 16% of these genes are known drivers of clonal hematopoiesis-an age-related somatic mosaicism associated with myeloid cancers and several nonmalignant diseases. Somatic variant analyses revealed gene-specific associations with telomere length, including lengthened telomeres in individuals with large SRSF2-mutant clones, compared with shortened telomeres in individuals with clonal expansions driven by other genes. Collectively, our findings demonstrate the impact of rare variants on telomere length, with larger effects observed among genes also associated with clonal hematopoiesis.
Project description:BackgroundCystic fibrosis (CF) is an autosomal recessive disease caused by genetic variants of the cystic fibrosis transmembrane conductance regulator (CFTR) gene. It is a common hereditary disease in Caucasians while rare in the Chinese. Until now, only 87 Chinese patients have been reported with molecular confirmations. The variant spectrum and clinical features of Chinese CF patients are obviously different from those of Caucasians.Materials and methodsWhole-exome sequencing was applied to analyze the exome of three individuals who have only the typical CF phenotype in the respiratory system from two consanguineous families. The protein domain and structure analysis were applied to predict the impact of the variants. Sanger sequencing was applied to validate the candidate variants.ResultsA previously reported homozygous variant in CFTR (NM_000492.4: c.1000C > T, p.R334W) was identified in proband I. A novel homozygous variant in a polymorphic position (NM_000492.4: c.1409T > A, p.V470E) was identified in two individuals in the family II. The novel CFTR variant predicted to be disease-causing is the first, to the best of our knowledge, to be reported in CFTR. However, in vitro validation is still needed.ConclusionOur finding expands the variant spectrum of CFTR, reveals clearer clinical phenotype distinction and variant spectrum distinction between Chinese and Caucasian CF patients, and contributes to a more rapid genetic diagnosis and future genetic counseling.
Project description:In biobank data analysis, most binary phenotypes have unbalanced case-control ratios, and this can cause inflation of type I error rates. Recently, a saddle point approximation (SPA) based single-variant test has been developed to provide an accurate and scalable method to test for associations of such phenotypes. For gene- or region-based multiple-variant tests, a few methods exist that can adjust for unbalanced case-control ratios; however, these methods are either less accurate when case-control ratios are extremely unbalanced or not scalable for large data analyses. To address these problems, we propose SKAT- and SKAT-O- type region-based tests; in these tests, the single-variant score statistic is calibrated based on SPA and efficient resampling (ER). Through simulation studies, we show that the proposed method provides well-calibrated p values. In contrast, when the case-control ratio is 1:99, the unadjusted approach has greatly inflated type I error rates (90 times that of exome-wide sequencing α = 2.5 × 10-6). Additionally, the proposed method has similar computation time to the unadjusted approaches and is scalable for large sample data. In our application, the UK Biobank whole-exome sequence data analysis of 45,596 unrelated European samples and 791 PheCode phenotypes identified 10 rare-variant associations with p value < 10-7, including the associations between JAK2 and myeloproliferative disease, HOXB13 and cancer of prostate, and F11 and congenital coagulation defects. All analysis summary results are publicly available through a web-based visual server, and this availability can help facilitate the identification of the genetic basis of complex diseases.
Project description:Exome association studies to date have generally been underpowered to systematically evaluate the phenotypic impact of very rare coding variants. We leveraged extensive haplotype sharing between 49,960 exome-sequenced UK Biobank participants and the remainder of the cohort (total n ≈ 500,000) to impute exome-wide variants with accuracy R2 > 0.5 down to minor allele frequency (MAF) ~0.00005. Association and fine-mapping analyses of 54 quantitative traits identified 1,189 significant associations (P < 5 × 10-8) involving 675 distinct rare protein-altering variants (MAF < 0.01) that passed stringent filters for likely causality. Across all traits, 49% of associations (578/1,189) occurred in genes with two or more hits; follow-up analyses of these genes identified allelic series containing up to 45 distinct 'likely-causal' variants. Our results demonstrate the utility of within-cohort imputation in population-scale genome-wide association studies, provide a catalog of likely-causal, large-effect coding variant associations and foreshadow the insights that will be revealed as genetic biobank studies continue to grow.
Project description:Pulmonary function is an indicator of well-being, and pulmonary pathologies are the third major cause of death worldwide. We analysed the UK Biobank genome-wide association summary statistics of pulmonary function for Europeans and individuals of recent African descent to identify variants associated with the trait in the two ancestries. Here, we show 627 variants in Europeans and 3 in Africans associated with three pulmonary function parameters. In addition to the 110 variants in Europeans previously reported to be associated with phenotypes related to pulmonary function, we identify 279 novel loci, including an ISX intergenic variant rs369476290 on chromosome 22 in Africans. Remarkably, we find no shared variants among Africans and Europeans. Furthermore, enrichment analyses of variants separately for each ancestry background reveal significant enrichment for terms related to pulmonary phenotypes in Europeans but not Africans. Further analysis of studies of pulmonary phenotypes reveals that individuals of European background are disproportionally overrepresented in datasets compared to Africans, with the gap widening over the past five years. Our findings extend our understanding of the different variants that modify the pulmonary function in Africans and Europeans, a promising finding for future GWASs and medical studies.
Project description:Genome-wide association studies have discovered hundreds of associations between common genotypes and kidney function but cannot comprehensively investigate rare coding variants. Here, we apply a genotype imputation approach to whole exome sequencing data from the UK Biobank to increase sample size from 166,891 to 408,511. We detect 158 rare variants and 105 genes significantly associated with one or more of five kidney function traits, including genes not previously linked to kidney disease in humans. The imputation-powered findings derive support from clinical record-based kidney disease information, such as for a previously unreported splice allele in PKD2, and from functional studies of a previously unreported frameshift allele in CLDN10. This cost-efficient approach boosts statistical power to detect and characterize both known and novel disease susceptibility variants and genes, can be generalized to larger future studies, and generates a comprehensive resource ( https://ckdgen-ukbb.gm.eurac.edu/ ) to direct experimental and clinical studies of kidney disease.
Project description:Composite hemangioendothelioma (CHE) is considered a borderline malignant vascular tumor defined by an admixture of distinct vascular neoplastic components. A 21-year-old female is presented herein with a 1 cm painless mandibular vestibular mass of less than a year duration. The infiltrating tumor was characterized by dilated vascular channels lined by endothelial cells with bland ovoid or round nuclei exhibiting, occasionally, hobnail/matchstick-like arrangement. Intravascular cell proliferations with hyaline globular deposits were also present. Additionally, lobular spindle and epithelioid cell aggregates, as well as slit-like spaces exhibiting a retiform or angiosarcomatous morphology were observed. Intracytoplasmic signet-ring or lipoblast-like vacuolization was also noted. Mitotic activity was exceptionally rare. Vascular spaces and the stroma featured lymphocytes and plasma cells. Neoplastic cells were positive for CD31, CD34, D2-40 and ERG, negative for CAMTA1 and synaptophysin, while type IV collagen highlighted the plasmalemma of most vessels and hyaline globules. Fluorescence in situ hybridization revealed gene rearrangements in both YAP1 and MAML2 genes, in keeping with a YAP1-MAML2 fusion. Whole exome sequencing (WES) identified three missense mutations FLT1 [p.R1016G], PIK3CA [p.H1047L], and C11orf42 [p.A304P] and a mitochondrial frameshift insertion MT-ND4 [c.1107_1108insC; p.P370fs]. These WES results suggest that FLT1 and/or PIK3CA variants may contribute to tumor growth/transformation while the MT-ND4 variant may relate to proliferation, angiogenesis and/or inhibition of apoptosis.
Project description:A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.