Genomic Characterization and Comparison of Multi-Regional and Pooled Tumor Biopsy Specimens.
ABSTRACT: A single tumor biopsy specimen is typically used in cancer genome studies. However, it may represent incompletely the underlying mutational and transcriptional profiles of tumor biology. Multi-regional biopsies have the advantage of increased sensitivity for genomic profiling, but they are not cost-effective. The concept of an alternative method such as the pooling of multiple biopsies is a challenge. In order to determine if the pooling of distinct regions is representative at the genomic and transcriptome level, we performed sequencing of four regional samples and pooled samples for four cancer types including colon, stomach, kidney and liver cancer. Subsequently, a comparative analysis was conducted to explore differences in mutations and gene expression profiles between multiple regional biopsies and pooled biopsy for each tumor. Our analysis revealed a marginal level of regional difference in detected variants, but in those with low allele frequency, considerable discrepancies were observed. In conclusion, sequencing pooled samples has the benefit of detecting many variants with moderate allele frequency that occur in partial regions, but it is not applicable for detecting low-frequency mutations that require deep sequencing.
Project description:Tumor heterogeneity is a consequence of clonal evolution, resulting in a fractal-like architecture with spatially separated main clones, sub-clones and single-cells. As sequencing an entire tumor is not feasible, we ask the question whether there is an optimal clinical sampling strategy that can handle heterogeneity and hypermutations? Here, we tested the effect of sample size, pooling strategy as well as sequencing depth using whole-exome sequencing of ovarian tumor specimens paired with normal blood samples. Our study has an emphasis on clinical application-hence we compared single biopsy, combined local biopsies and combined multi-regional biopsies. Our results show that sequencing from spatially neighboring regions show similar genetic compositions, with few private mutations. Pooling samples from multiple distinct regions of the primary tumor did not increase the overall number of identified mutations but may increase the robustness of detecting clonal mutations. Hypermutating tumors are a special case, since increasing sample size can easily dilute sub-clonal private mutations below detection thresholds. In summary, we compared the effects of sampling strategies (single biopsy, multiple local samples, pooled global sample) on mutation detection by next generation sequencing. In view of the limitations of present tools and technologies, only one sequencing run per sample combined with high coverage (100-300?×) sequencing is affordable and practical, regardless of the number of samples taken from the same patient.
Project description:High-throughput sequencing using pooled DNA samples can facilitate genome-wide studies on rare and low-frequency variants in a large population. Some major questions concerning the pooling sequencing strategy are whether rare and low-frequency variants can be detected reliably, and whether estimated minor allele frequencies (MAFs) can represent the actual values obtained from individually genotyped samples. In this study, we evaluated MAF estimates using three variant detection tools with two sets of pooled whole exome sequencing (WES) and one set of pooled whole genome sequencing (WGS) data. Both GATK and Freebayes displayed high sensitivity, specificity and accuracy when detecting rare or low-frequency variants. For the WGS study, 56% of the low-frequency variants in Illumina array have identical MAFs and 26% have one allele difference between sequencing and individual genotyping data. The MAF estimates from WGS correlated well (r?=?0.94) with those from Illumina arrays. The MAFs from the pooled WES data also showed high concordance (r?=?0.88) with those from the individual genotyping data. In conclusion, the MAFs estimated from pooled DNA sequencing data reflect the MAFs in individually genotyped samples well. The pooling strategy can thus be a rapid and cost-effective approach for the initial screening in large-scale association studies.
Project description:BACKGROUND:Sensitive detection of low-frequency single nucleotide variants carries great significance in many applications. In cancer genetics research, tumor biopsies are a mixture of normal and tumor cells from various subpopulations due to tumor heterogeneity. Thus the frequencies of somatic variants from a subpopulation tend to be low. Liquid biopsies, which monitor circulating tumor DNA in blood to detect metastatic potential, also face the challenge of detecting low-frequency variants due to the small percentage of the circulating tumor DNA in blood. Moreover, in population genetics research, although pooled sequencing of a large number of individuals is cost-effective, pooling dilutes the signals of variants from any individual. Detection of low frequency variants is difficult and can be cofounded by sequencing artifacts. Existing methods are limited in sensitivity and mainly focus on frequencies around 2 % to 5 %; most fail to consider differential sequencing artifacts. RESULTS:We aimed to push down the frequency detection limit close to the position specific sequencing error rates by modeling the observed erroneous read counts with respect to genomic sequence contexts. 4 distributions suitable for count data modeling (using generalized linear models) were extensively characterized in terms of their goodness-of-fit as well as the performances on real sequencing data benchmarks, which were specifically designed for testing detection of low-frequency variants; two sequencing technologies with significantly different chemistry mechanisms were used to explore systematic errors. We found the zero-inflated negative binomial distribution generalized linear mode is superior to the other models tested, and the advantage is most evident at 0.5 % to 1 % range. This method is also generalizable to different sequencing technologies. Under standard sequencing protocols and depth given in the testing benchmarks, 95.3 % recall and 79.9 % precision for Ion Proton data, 95.6 % recall and 97.0 % precision for Illumina MiSeq data were achieved for SNVs with frequency?>?= 1 %, while the detection limit is around 0.5 %. CONCLUSIONS:Our method enables sensitive detection of low-frequency single nucleotide variants across different sequencing platforms and will facilitate research and clinical applications such as pooled sequencing, cancer early detection, prognostic assessment, metastatic monitoring, and relapses or acquired resistance identification.
Project description:Background:Mutations in rat sarcoma (RAS) genes may be a mechanism of secondary resistance in epidermal growth factor receptor inhibitor-treated patients. Tumor-tissue biopsy testing has been the standard for evaluating mutational status; however, plasma testing of cell-free DNA has been shown to be a more sensitive method for detecting clonal evolution. Materials and methods:Archival pre- and post-treatment tumor biopsy samples from a phase II study of panitumumab in combination with irinotecan in patients with metastatic colorectal cancer (mCRC) that also collected plasma samples before, during, and after treatment were analyzed for emergence of mutations during/post-treatment by next-generation sequencing and BEAMing. Results:The rate of emergence of tumor tissue RAS mutations was 9.5% by next-generation sequencing (n?=?21) and 6.3% by BEAMing (n?=?16). Plasma testing of cell-free DNA by BEAMing revealed a mutant RAS emergence rate of 36.7% (n?=?39). Exploratory outcomes analysis of plasma samples indicated that patients who had emergent RAS mutations at progression had similar median progression-free survival to those patients who remained wild-type at progression. Serial analysis of plasma samples showed that the first detected emergence of RAS mutations preceded progression by a median of 3.6?months (range, -0.3 to 7.5?months) and that there did not appear to be a mutant RAS allele frequency threshold that could predict near-term outcomes. Conclusions:This first prospective analysis in mCRC showed that serial plasma biopsies are more inclusive than tissue biopsies for evaluating global tumor heterogeneity; however, the clinical utility of plasma testing in mCRC remains to be further explored. ClinicalTrials.gov Identifier:NCT00891930.
Project description:BACKGROUND:Circulating tumor (ct) DNA assays performed in clinical laboratories provide tumor biomarker testing support for biopharmaceutical clinical trials. Yet it is neither practical nor economically feasible for many of these clinical laboratories to internally develop their own liquid biopsy assay. Commercially available ctDNA kits are a potential solution for laboratories seeking to incorporate liquid biopsy into their test menus. However, the scarcity of characterized patient samples and cost of purchasing validation reference standards creates a barrier to entry. In the current study, we evaluated the analytical performance of the AVENIO ctDNA liquid biopsy platform (Roche Sequencing Solutions) for use in our clinical laboratory. METHOD:Intra-laboratory performance evaluation of AVENIO ctDNA Targeted, Expanded, and Surveillance kits (Research Use Only) was performed according to College of American Pathologists (CAP) guidelines for the validation of targeted next generation sequencing assays using purchased reference standards, de-identified human plasma cell-free (cf) DNA samples, and contrived samples derived from commercially purchased normal and cancer human plasma. All samples were sequenced at read depths relevant to clinical settings using the NextSeq High Output kit (Illumina). RESULTS:At the clinically relevant read depth, Avenio ctDNA kits demonstrated 100% sensitivity in detecting single nucleotide variants (SNVs) at ?0.5% allele frequency (AF) and 50% sensitivity in detecting SNVs at 0.1% AF using 20-40?ng sample input amount. The assay integrated seamlessly into our laboratory's NGS workflow with input DNA mass, target allele frequency (TAF), multiplexing, and number of reads optimized to support a high-throughput assay appropriate for biopharmaceutical trials. CONCLUSIONS:Our study demonstrates that AVENIO ctDNA liquid biopsy platform provides a viable alternative for efficient incorporation of liquid biopsy assays into the clinical laboratory for detecting somatic alterations as low as 0.5%. Accurate detection of variants lower than 0.5% could potentially be achieved by deeper sequencing when clinically indicated and economically feasible.
Project description:Effect sizes of many common single nucleotide polymorphisms identified in genome-wide association studies generally explain only a modest fraction of the total estimated heritability in a variety of traits. One hypothesis is that rare variants with larger effects might account for the missing heritability. Despite advances in sequencing technology, discovering rare variants in a large population is still economically challenging. Sequencing pooled samples can reduce the cost, but detecting rare variants and identifying individual carriers is difficult and requires additional experiments. To address these issues, we have developed a rare variant-detection algorithm V-Sieve to screen for rare alleles in pooled DNA samples which, in combination with a unique pooling strategy, is able to efficiently screen a candidate gene for idiosyncratic variants in thousands of samples. We applied this method to 2283 individuals, and identified >100 polymorphisms in the C-reactive protein locus at an allele frequency as low as 0.02%, with a positive predictive rate of 93%. We believe this algorithm will be useful in both screening for rare variants in genomic regions known to associate with particular phenotypes and in replicating rare variant associations identified in large-scale studies, such as exome re-sequencing projects.
Project description:Sequencing large numbers of individual samples is often needed for countrywide antimalarial drug resistance surveillance. Pooling DNA from several individual samples is an alternative cost and time saving approach for providing allele frequency (AF) estimates at a population level. Using 100 individual patient DNA samples of dried blood spots from a 2017 nationwide drug resistance surveillance study in Haiti, we compared codon coverage of drug resistance-conferring mutations in four Plasmodium falciparum genes (crt, dhps, dhfr, and mdr1), for the same deep sequenced samples run individually and pooled. Samples with similar real-time PCR cycle threshold (Ct) values (+/- 1.0 Ct value) were combined with ten samples per pool. The sequencing success for samples in pools were higher at a lower parasite density than the individual samples sequence method. The median codon coverage for drug resistance-associated mutations in all four genes were greater than 3-fold higher in the pooled samples than in individual samples. The overall codon coverage distribution for pooled samples was wider than the individual samples. The sample pools with < 40 parasites/μL blood showed more discordance in AF calls for dhfr and mdr1 between the individual and pooled samples. This discordance in AF estimation may be due to low amounts of parasite DNA, which could lead to variable PCR amplification efficiencies. Grouping samples with an estimated ≥ 40 parasites/μL blood prior to pooling and deep sequencing yielded the expected population level AF. Pooling DNA samples based on estimates of > 40 parasites/μL prior to deep sequencing can be used for rapid genotyping of a large number of samples for these four genes and possibly other drug resistant markers in population-based studies. As Haiti is a low malaria transmission country with very few mixed infections and continued chloroquine sensitivity, the pooled sequencing approach can be used for routine national molecular surveillance of resistant parasites.
Project description:Resequencing of genomic regions that have been implicated by linkage and/or association studies to harbor genetic susceptibility loci represents a necessary step to identify causal variants. Massively parallel sequencing (MPS) offers the possibility of SNP discovery and frequency determination among pooled DNA samples. The strategies of pooling DNA samples and pooling PCR amplicons generated from individual DNA samples were evaluated, and both were found to return accurate estimates of SNP frequencies across varying levels of sequence coverage.
Project description:Sequencing pooled DNA of multiple individuals from a population instead of sequencing individuals separately has become popular due to its cost-effectiveness and simple wet-lab protocol, although some criticism of this approach remains. Here we validated a protocol for pooled whole-genome re-sequencing (Pool-seq) of Arabidopsis lyrata libraries prepared with low amounts of DNA (1.6 ng per individual). The validation was based on comparing single nucleotide polymorphism (SNP) frequencies obtained by pooling with those obtained by individual-based Genotyping By Sequencing (GBS). Furthermore, we investigated the effect of sample number, sequencing depth per individual and variant caller on population SNP frequency estimates. For Pool-seq data, we compared frequency estimates from two SNP callers, VarScan and Snape; the former employs a frequentist SNP calling approach while the latter uses a Bayesian approach. Results revealed concordance correlation coefficients well above 0.8, confirming that Pool-seq is a valid method for acquiring population-level SNP frequency data. Higher accuracy was achieved by pooling more samples (25 compared to 14) and working with higher sequencing depth (4.1× per individual compared to 1.4× per individual), which increased the concordance correlation coefficient to 0.955. The Bayesian-based SNP caller produced somewhat higher concordance correlation coefficients, particularly at low sequencing depth. We recommend pooling at least 25 individuals combined with sequencing at a depth of 100× to produce satisfactory frequency estimates for common SNPs (minor allele frequency above 0.05).