Rare variant phasing using paired tumor:normal sequence data.
ABSTRACT: BACKGROUND:In standard high throughput sequencing analysis, genetic variants are not assigned to a homologous chromosome of origin. This process, called haplotype phasing, can reveal information important for understanding the relationship between genetic variants and biological phenotypes. For example, in genes that carry multiple heterozygous missense variants, phasing resolves whether one or both gene copies are altered. Here, we present a novel approach to phasing variants that takes advantage of unique properties of paired tumor:normal sequencing data from cancer studies. RESULTS:VAF phasing uses changes in variant allele frequency (VAF) between tumor and normal samples in regions of somatic chromosomal gain or loss to phase germline variants. We apply VAF phasing to 6180 samples from the Cancer Genome Atlas (TCGA) and demonstrate that our method is highly concordant with other standard phasing methods, and can phase an average of 33% more variants than other read-backed phasing methods. Using variant annotation tools designed to score gene haplotypes, we find a suggestive association between carrying multiple missense variants in a single copy of a cancer predisposition gene and earlier age of cancer diagnosis. CONCLUSIONS:VAF phasing exploits unique properties of tumor genomes to increase the number of germline variants that can be phased over standard read-backed methods in paired tumor:normal samples. Our phase-informed association testing results call attention to the need to develop more tools for assessing the joint effect of multiple genetic variants.
Project description:Linked-read sequencing enables greatly improves haplotype assembly over standard paired-end analysis. The detection of mosaic single-nucleotide variants benefits from haplotype assembly when the model is informed by the mapping between constituent reads and linked reads. Samovar evaluates haplotype-discordant reads identified through linked-read sequencing, thus enabling phasing and mosaic variant detection across the entire genome. Samovar trains a random forest model to score candidate sites using a dataset that considers read quality, phasing, and linked-read characteristics. Samovar calls mosaic single-nucleotide variants (SNVs) within a single sample with accuracy comparable with what previously required trios or matched tumor/normal pairs and outperforms single-sample mosaic variant callers at minor allele frequency 5%-50% with at least 30X coverage. Samovar finds somatic variants in both tumor and normal whole-genome sequencing from 13 pediatric cancer cases that can be corroborated with high recall with whole exome sequencing. Samovar is available open-source at https://github.com/cdarby/samovar under the MIT license.
Project description:Accurate detection of genomic alterations using high-throughput sequencing is an essential component of precision cancer medicine. We characterize the variant allele fractions (VAFs) of somatic single nucleotide variants and indels across 5095 clinical samples profiled using a custom panel, CancerSCAN. Our results demonstrate that a significant fraction of clinically actionable variants have low VAFs, often due to low tumor purity and treatment-induced mutations. The percentages of mutations under 5% VAF across hotspots in EGFR, KRAS, PIK3CA, and BRAF are 16%, 11%, 12%, and 10%, respectively, with 24% for EGFR T790M and 17% for PIK3CA E545. For clinical relevance, we describe two patients for whom targeted therapy achieved remission despite low VAF mutations. We also characterize the read depths necessary to achieve sensitivity and specificity comparable to current laboratory assays. These results show that capturing low VAF mutations at hotspots by sufficient sequencing coverage and carefully tuned algorithms is imperative for a clinical assay.
Project description:BACKGROUND:The progression of colorectal cancer (CRC) mainly stems from the occurrence of somatic mutation. However, there is little information that can be used to comprehensively analyse the importance of germline variants in CRC patients. PATIENTS AND METHODS:The candidate germline variants between tumor relapse and cured rectal adenocarcinoma (READ) were firstly filtered by whole-exome sequencing (n=4), and validated by targeted sequencing and associated with clinical outcome in READ (n=48). RESULTS:We identified 9 pathogenic germline variants that were clinically associated with survival outcome in READ, including TIPIN, TLR1, TLR10, OR4D6, IGSF3, UBBP4, OR6J1, FAM208A and DISC1. Patients carrying these germline susceptibility variants had an increased risk of poor survival outcome compared to those without these variants. CONCLUSION:Not only the tumor genome, but also the germline sequence must be analysed to depict the overall genetic profile, providing potential therapeutic strategies for personalized medicine.
Project description:OBJECTIVE:A simultaneous detection of germline and somatic mutations in ovarian cancer (OC) using tumor materials is considered to be cost-effective for BRCA1/2 testing. However, there are limited studies of the analytical performances according to various sample types. The aim of this study is to propose a strategy for routine BRCA1/2 next-generation sequencing (NGS) screening based on analytical performance according to different sample types. METHODS:We compared BRCA1/2 NGS screening assay using buffy coat, fresh-frozen (FF) and formalin-fixed paraffin-embedded (FFPE) from 130 samples. RESULTS:The rate of repeated tests in a total of buffy coat, FF and FFPE was 0%, 8%, and 34%, respectively. The accuracy of BRCA1/2 NGS testing was 100.0%, 99.9% and 99.9% in buffy coat, FFPE and FF, respectively. However, due to the presence of variant allele frequency (VAF) shifted heterozygous variants, tumor materials (FFPE and FF) showed lower sensitivity (95.5%-99.0%) than buffy coat (100%). Furthermore, FFPE showed 51.4% of the positive predictive value (PPV) on account of sequence artifacts. When performed in the post-filtration process, PPV was increased by approximately 20% in FFPE. Buffy coat showed 100% of sensitivity, specificity and accuracy in BRCA1/2 NGS test. CONCLUSIONS:On the comparison of the analytical performance according to different sample types, the buffy coat was not affected by sequencing artifacts and VAF shifted variants. Therefore, the blood test should be given priority in detecting germline BRCA1/2 mutation, and tumor materials could be suitable to detect somatic mutations in OC patients without identifying germline BRCA1/2 mutation.
Project description:Whole-genome sequencing of DNA from single cells has the potential to reshape our understanding of mutational heterogeneity in normal and diseased tissues. However, a major difficulty is distinguishing amplification artifacts from biologically derived somatic mutations. Here, we describe linked-read analysis (LiRA), a method that accurately identifies somatic single-nucleotide variants (sSNVs) by using read-level phasing with nearby germline heterozygous polymorphisms, thereby enabling the characterization of mutational signatures and estimation of somatic mutation rates in single cells.
Project description:Detection of mosaic mutations that arise in normal development is challenging, as such mutations are typically present in only a minute fraction of cells and there is no clear matched control for removing germline variants and systematic artifacts. We present MosaicForecast, a machine-learning method that leverages read-based phasing and read-level features to accurately detect mosaic single-nucleotide variants and indels, achieving a multifold increase in specificity compared with existing algorithms. Using single-cell sequencing and targeted sequencing, we validated 80-90% of the mosaic single-nucleotide variants and 60-80% of indels detected in human brain whole-genome sequencing data. Our method should help elucidate the contribution of mosaic somatic mutations to the origin and development of disease.
Project description:PURPOSE:Structural variants (SVs) may be an underestimated cause of hereditary cancer syndromes given the current limitations of short-read next-generation sequencing. Here we investigated the utility of long-read sequencing in resolving germline SVs in cancer susceptibility genes detected through short-read genome sequencing. METHODS:Known or suspected deleterious germline SVs were identified using Illumina genome sequencing across a cohort of 669 advanced cancer patients with paired tumor genome and transcriptome sequencing. Candidate SVs were subsequently assessed by Oxford Nanopore long-read sequencing. RESULTS:Nanopore sequencing confirmed eight simple pathogenic or likely pathogenic SVs, resolving three additional variants whose impact could not be fully elucidated through short-read sequencing. A recurrent sequencing artifact on chromosome 16p13 and one complex rearrangement on chromosome 5q35 were subsequently classified as likely benign, obviating the need for further clinical assessment. Variant configuration was further resolved in one case with a complex pathogenic rearrangement affecting TSC2. CONCLUSION:Our findings demonstrate that long-read sequencing can improve the validation, resolution, and classification of germline SVs. This has important implications for return of results, cascade carrier testing, cancer screening, and prophylactic interventions.
Project description:Background:Germline mutations in DNA damage signalling and repair genes predispose individuals to cancer. Rare germline variants may also increase cancer risk and be predictive of outcomes following cancer treatments, but require high-throughput sequencing (HTS) for detection in large cohorts. Objective:To use a dual indexing system on a HTS platform to detect novel variants in CtIP (RBBP8) which may be associated with clinical outcomes following radiotherapy treatment for bladder cancer. Methods:All exons and flanking introns of CtIP were amplified from germline DNA from bladder cancer patients using seven primer pairs by automated long-range PCR. Amplicons were pooled, fragmented and ligated to adaptor sequences. One of 96 tag sequences was introduced at each end by PCR. Sequencing was performed on a single flow cell of an Illumina MiSeq. Reads were mapped by Stampy and variants called by Platypus. For phasing experiments, target regions were amplified and cloned for Sanger sequencing. Results:Of 201 samples, 160 were successfully amplified. Eleven CtIP variants were called, within the exons and 15 bp adjacent intronic DNA, including eight known variants from the 1000 Genomes project, plus three previously unreported variants now confirmed by Sanger sequencing. In two individuals, phasing experiments showed two variants of interest to be on separate alleles, likely to result in stronger impairment of gene function. Conclusions:We have demonstrated proof of principle for dual indexing on 160 samples on one MiSeq flow cell sequencing surface, and show that for the CtIP gene multiplexing of up to 720 samples would provide sufficient coverage to achieve >98% detection power for rare germline variation, reducing HTS costs substantially.
Project description:Metaplastic breast carcinoma (MBC) is rare and has a poor prognosis. Here we describe genetic analysis of a 41-yr-old female patient with MBC and neurofibromatosis type I (NF1). She initially presented with pT3N1a, grade 3 MBC, but lung metastases were discovered subsequently. To identify the molecular cause of her NF1, we screened for germline mutations disrupting NF1 or SPRED1, revealing a heterozygous germline single-nucleotide variant (SNV) in exon 21 of NF1 at c.2709G>A, Chr 17: 29556342. By report, this variant disrupts pre-mRNA splicing of NF1 transcripts. No pathogenic mutations were identified in SPRED1 A potential association between MBC and NF1 was reported in eight previous cases, but none underwent detailed genomics analysis. To identify additional candidate germline variants potentially predisposing to MBC, we conducted targeted exome sequencing of 279 established cancer-causing genes in a control blood sample, disclosing four rare SNVs. Analysis of her breast tumor showed markedly altered variant allelic fractions (VAFs) for two (50%) of them, revealing somatic loss of heterozygosity (LOH) at germline SNVs. Of these, only the VAF of the pathogenic SNV in NF1 was increased in the tumor. Tumor sequencing demonstrated five somatic mutations altering TP53, BRCA1, and other genes potentially contributing to cancer formation. Because somatic LOH at certain germline SNVs can enhance their impacts, we conclude that increased allelic imbalance of the pathogenic SNV in NF1 likely contributed to tumorigenesis. Our results highlight a need to assess predisposing genetic factors and LOH that can cause rare, aggressive diseases such as MBC in NF1.
Project description:Somatic mutations promote the transformation of normal cells to cancer. Accurate identification of such mutations facilitates cancer diagnosis and treatment, but biological and technological noises, including intra-tumor heterogeneity, sample contamination, uncertainties in base sequencing and read alignment, pose a big challenge to somatic mutation discovery. A number of callers have been developed to predict them from paired tumor/normal or unpaired tumor sequencing data. However, the small size of currently available experimentally validated somatic sites limits evaluation and then improvement of callers. Fortunately, NIST reference material NA12878 genome has been well-characterized with publicly available high-confidence genotype calls, and biological and technological noises can be computationally generalized to the number of sub-clones, the VAFs, the sequencing and mapping qualities. We used BAMSurgeon to create simulated tumors by introducing somatic small variants (SNVs and small indels) into homozygous reference or wildtype sites of NA12878. We generated 135 simulated tumors from 5 pre-tumors/normals. These simulated tumors vary in sequencing and subsequent mapping error profiles, read length, the number of sub-clones, the VAF, the mutation frequency across the genome and the genomic context. Furthermore, these pure tumor/normal pairs can be mixed at desired ratios within each pair to simulate sample contamination. This database (a total size of 15 terabytes) will be of great use to benchmark somatic small variant callers and guide their improvement.