Evaluation of CADD Scores in Curated Mismatch Repair Gene Variants Yields a Model for Clinical Validation and Prioritization.
ABSTRACT: Next-generation sequencing in clinical diagnostics is providing valuable genomic variant data, which can be used to support healthcare decisions. In silico tools to predict pathogenicity are crucial to assess such variants and we have evaluated a new tool, Combined Annotation Dependent Depletion (CADD), and its classification of gene variants in Lynch syndrome by using a set of 2,210 DNA mismatch repair gene variants. These had already been classified by experts from InSiGHT's Variant Interpretation Committee. Overall, we found CADD scores do predict pathogenicity (Spearman's ρ = 0.595, P < 0.001). However, we discovered 31 major discrepancies between the InSiGHT classification and the CADD scores; these were explained in favor of the expert classification using population allele frequencies, cosegregation analyses, disease association studies, or a second-tier test. Of 751 variants that could not be clinically classified by InSiGHT, CADD indicated that 47 variants were worth further study to confirm their putative pathogenicity. We demonstrate CADD is valuable in prioritizing variants in clinically relevant genes for further assessment by expert classification teams.
Project description:Combined Annotation-Dependent Depletion (CADD) is a widely used measure of variant deleteriousness that can effectively prioritize causal variants in genetic analyses, particularly highly penetrant contributors to severe Mendelian disorders. CADD is an integrative annotation built from more than 60 genomic features, and can score human single nucleotide variants and short insertion and deletions anywhere in the reference assembly. CADD uses a machine learning model trained on a binary distinction between simulated de novo variants and variants that have arisen and become fixed in human populations since the split between humans and chimpanzees; the former are free of selective pressure and may thus include both neutral and deleterious alleles, while the latter are overwhelmingly neutral (or, at most, weakly deleterious) by virtue of having survived millions of years of purifying selection. Here we review the latest updates to CADD, including the most recent version, 1.4, which supports the human genome build GRCh38. We also present updates to our website that include simplified variant lookup, extended documentation, an Application Program Interface and improved mechanisms for integrating CADD scores into other tools or applications. CADD scores, software and documentation are available at https://cadd.gs.washington.edu.
Project description:Several in silico tools have been shown to have reasonable research sensitivity and specificity for classifying sequence variants in coding regions. The recently developed combined annotation-dependent depletion (CADD) method generates predictive scores for single-nucleotide variants (SNVs) in all areas of the genome, including noncoding regions. We sought for non-coding variants to determine the clinical validity of common CADD scores.We evaluated 12,391 unique SNVs in 624 patient samples submitted for germ-line mutation testing in a cancer-related gene panel. Stratifying by genomic region, we compared the distributions of CADD scores of rare SNVs, SNVs common in our patient population, and the null distribution of all possible SNVs.The median CADD scores of intronic and nonsynonymous variants were significantly different between rare and common SNVs (P < 0.0001). Despite these different distributions, no individual variants could be identified as plausibly causative among the rare intronic variants with the highest scores. The receiver-operating characteristics (ROC) area under the curve (AUC) for noncoding variants is modest, and the positive predictive value of CADD for intronic variants in panel testing was found to be 0.088.Focused in silico scoring systems with much higher predictive value will be necessary for clinical genomic applications.Genet Med 18 12, 1269-1275.
Project description:<h4>Background</h4>Predicting the deleteriousness of observed genomic variants has taken a step forward with the introduction of the Combined Annotation Dependent Depletion (CADD) approach, which trains a classifier on the wealth of available human genomic information. This raises the question whether it can be done with less data for non-human species. Here, we investigate the prerequisites to construct a CADD-based model for a non-human species.<h4>Results</h4>Performance of the mouse model is competitive with that of the human CADD model and better than established methods like PhastCons conservation scores and SIFT. Like in the human case, performance varies for different genomic regions and is best for coding regions. We also show the benefits of generating a species-specific model over lifting variants to a different species or applying a generic model. With fewer genomic annotations, performance on the test set as well as on the three validation sets is still good.<h4>Conclusions</h4>It is feasible to construct species-specific CADD models even when annotations such as epigenetic markers are not available. The minimal requirement for these models is the availability of a set of genomes of closely related species that can be used to infer an ancestor genome and substitution rates for the data generation.
Project description:Background:Myoclonus-Dystonia (M-D) is a pleiotropic neuropsychiatric disorder of variable penetrance. Pathogenic variants in SGCE, a maternally imprinted gene, are the most frequent known genetic cause of M-D. The population prevalence of SGCE-linked M-D is unknown, the pathogenicity of SGCE variants identified in patients with M-D may be indeterminant, and SGCE variants predicted to be deleterious by in silico analysis may appear in patients undergoing whole-exome or whole-genome sequencing for seemingly unrelated disorders. The Genome Aggregation Database (gnomAD) v2 provides variant data on 125,748 exomes and 15,708 genomes from unrelated individuals sequenced as part of various disease-specific and population genetic studies. Methods:SGCE variants included in the gnomAD v2 dataset were analyzed with Combined Annotation Dependent Depletion (CADD), and database for nonsynonymous single nucleotide polymorphisms' functional predictions (dbNSFP). We determined the frequency of annotated SGCE variants, ranked by scores of deleteriousness, within the gnomAD v2 dataset. Deleteriousness scores were compared to a subset of published disease associated SGCE pathogenic variants. Results:Within gnomAD v2, there were 56, 408, and 1250 alleles harboring SGCE variants with CADD scores greater than 30, 25, and 20, respectively. We estimate that approximately 1/348 individuals in the United States population harbors an SGCE variant with a CADD score ? 25. Discussion:SGCE M-D may be underdiagnosed due to pleiotropy, mild phenotypes, variable penetrance, and impaired access to genetic testing. Due to the high population prevalence of deleterious SGCE variants, caution should be used when asserting pathogenicity without co-segregation analyses and expert neurological examination of phenotypes within pedigrees. Highlights:In silico analyses of a large population database of genetic variants revealed that over 0.2% of individuals in the United States harbor a highly deleterious SGCE variant. This finding suggests that M-D and minor phenotypic variants such as mild isolated myoclonus may be underdiagnosed.
Project description:A cadmium resistance gene, designated cadD, has been identified in and cloned from the Staphylococcus aureus plasmid pRW001. The gene is part of a two-component operon which contains the resistance gene cadD and an inactive regulatory gene, cadX*. A high degree of sequence similarity was observed between cadD and the cadB-like gene from S. lugdunensis, but no significant similarity was found with either cadA or cadB from the S. aureus plasmids pI258 and pII147. The positive regulatory gene cadX* is identical to cadX from pLUG10 over a stretch of 78 codons beginning at the N terminus, but it is truncated at this point and inactive. Sequence analysis showed that the cadmium resistance operon resides on a 3,972-bp element that is flanked by direct repeats of IS257. The expression of cadD in S. aureus and Bacillus subtilis resulted in low-level resistance to cadmium; in contrast, cadA and cadB from S. aureus induced higher level resistance. However, when the truncated version of cadX contained in pRW001 is complemented in trans with cadX from plasmid pLUG10, resistance increased approximately 10-fold suggesting that the cadmium resistance operons from pRW001 and pLUG10 are evolutionarily related. Moreover, the truncated version of cadX contained in pRW001 is nonfunctional and may have been generated by deletion during recombination to acquire the cadmium resistance element.
Project description:Current methods for annotating and interpreting human genetic variation tend to exploit a single information type (for example, conservation) and/or are restricted in scope (for example, to missense changes). Here we describe Combined Annotation-Dependent Depletion (CADD), a method for objectively integrating many diverse annotations into a single measure (C score) for each variant. We implement CADD as a support vector machine trained to differentiate 14.7 million high-frequency human-derived alleles from 14.7 million simulated variants. We precompute C scores for all 8.6 billion possible human single-nucleotide variants and enable scoring of short insertions-deletions. C scores correlate with allelic diversity, annotations of functionality, pathogenicity, disease severity, experimentally measured regulatory effects and complex trait associations, and they highly rank known pathogenic variants within individual genomes. The ability of CADD to prioritize functional, deleterious and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current single-annotation method.
Project description:BackgroundA foundational library called MORT (Molecular Objects and Relevant Templates) for the development of new software packages and tools employed in computational biology and computer-aided drug design (CADD) is described here.ResultsMORT contains several advantages compared with the other libraries. Firstly, MORT written in C++ natively supports the paradigm of object-oriented design, and thus it can be understood and extended easily. Secondly, MORT employs the relational model to represent a molecule, and it is more convenient and flexible than the traditional hierarchical model employed by many other libraries. Thirdly, a lot of functions have been included in this library, and a molecule can be manipulated easily at different levels. For example, it can parse a variety of popular molecular formats (MOL/SDF, MOL2, PDB/ENT, SMILES/SMARTS, etc.), create the topology and coordinate files for the simulations supported by AMBER, calculate the energy of a specific molecule based on the AMBER force fields, etc.ConclusionsWe believe that MORT can be used as a foundational library for programmers to develop new programs and applications for computational biology and CADD. Source code of MORT is available at http://cadd.suda.edu.cn/MORT/index.htm.
Project description:Recommendations for laboratories to report incidental findings from genomic tests have stimulated interest in such results. In order to investigate the criteria and processes for assigning the pathogenicity of specific variants and to estimate the frequency of such incidental findings in patients of European and African ancestry, we classified potentially actionable pathogenic single-nucleotide variants (SNVs) in all 4300 European- and 2203 African-ancestry participants sequenced by the NHLBI Exome Sequencing Project (ESP). We considered 112 gene-disease pairs selected by an expert panel as associated with medically actionable genetic disorders that may be undiagnosed in adults. The resulting classifications were compared to classifications from other clinical and research genetic testing laboratories, as well as with in silico pathogenicity scores. Among European-ancestry participants, 30 of 4300 (0.7%) had a pathogenic SNV and six (0.1%) had a disruptive variant that was expected to be pathogenic, whereas 52 (1.2%) had likely pathogenic SNVs. For African-ancestry participants, six of 2203 (0.3%) had a pathogenic SNV and six (0.3%) had an expected pathogenic disruptive variant, whereas 13 (0.6%) had likely pathogenic SNVs. Genomic Evolutionary Rate Profiling mammalian conservation score and the Combined Annotation Dependent Depletion summary score of conservation, substitution, regulation, and other evidence were compared across pathogenicity assignments and appear to have utility in variant classification. This work provides a refined estimate of the burden of adult onset, medically actionable incidental findings expected from exome sequencing, highlights challenges in variant classification, and demonstrates the need for a better curated variant interpretation knowledge base.
Project description:Missense DNA variants have variable effects upon protein function. Consequently, interpreting their pathogenicity is challenging, especially when they are associated with disease variability. To determine the degree to which functional assays inform interpretation, we analyzed 48 CFTR missense variants associated with variable expressivity of cystic fibrosis (CF). We assessed function in a native isogenic context by evaluating CFTR mutants that were stably expressed in the genome of a human airway cell line devoid of endogenous CFTR expression. 21 of 29 variants associated with full expressivity of the CF phenotype generated <10% wild-type CFTR (WT-CFTR) function, a conservative threshold for the development of life-limiting CF lung disease, and five variants had moderately decreased function (10% to ?25% WT-CFTR). The remaining three variants in this group unexpectedly had >25% WT-CFTR function; two were higher than 75% WT-CFTR. As expected, 14 of 19 variants associated with partial expressivity of CF had >25% WT-CFTR function; however, four had minimal to no effect on CFTR function (>75% WT-CFTR). Thus, 6 of 48 (13%) missense variants believed to be disease causing did not alter CFTR function. Functional studies substantially refined pathogenicity assignment with expert annotation and criteria from the American College of Medical Genetics and Genomics and Association for Molecular Pathology. However, four algorithms (CADD, REVEL, SIFT, and PolyPhen-2) could not differentiate between variants that caused severe, moderate, or minimal reduction in function. In the setting of variable expressivity, these results indicate that functional assays are essential for accurate interpretation of missense variants and that current prediction tools should be used with caution.
Project description:BACKGROUND:A recent study identified DCHS1 as a causal gene for mitral valve prolapse. The goal of this study is to investigate the presence and frequency of known and novel variants in this gene in 100 asymptomatic patients with moderate to severe organic mitral regurgitation. METHODS:DNA sequencing assays were developed for two previously identified functional missense variants, namely p.R2330C and p.R2513H, and all 21 exons of DCHS1. Pathogenicity of variants was evaluated in silico. RESULTS:p.R2330C and p.R2513H were not identified in this cohort. Sequencing all coding regions revealed eight missense variants including six considered deleterious. This includes one novel variant (p.A2464P) and two rare variants (p.R2770Q and p.R2462Q). These variants are predicted to be deleterious with combined annotation-dependent depletion (CADD) scores greater than 25, which are in the same range as p.R2330C (CADD = 28.0) and p.R2513H (CADD = 24.3). More globally, 24 of 100 cases were carriers of at least one in silico-predicted deleterious missense variant in DCHS1, suggesting that this single gene may account for a substantial portion of cases. CONCLUSION:This study reveals an important contribution of germline variants in DCHS1 in unrelated patients with mitral valve prolapse and supports genetic testing of this gene to screen individuals at risk.