Project description:The desire to analyse limited amounts of biological material, historic samples and rare cell populations has collectively driven the need for efficient methods for whole genome sequencing (WGS) of limited amounts of poor quality DNA. Most protocols are designed to recover double-stranded DNA (dsDNA) by ligating sequencing adaptors to dsDNA with or without subsequent polymerase chain reaction amplification of the library. While this is sufficient for many applications, limited DNA requires a method that can recover both single-stranded DNA (ssDNA) and dsDNA. Here, we present a WGS library preparation method, called 'degraded DNA adaptor tagging' (DDAT), adapted from a protocol designed for whole genome bisulfite sequencing. This method uses two rounds of random primer extension to recover both ssDNA and dsDNA. We show that by using DDAT we can generate WGS data from formalin-fixed paraffin-embedded (FFPE) samples using as little as 2 ng of highly degraded DNA input. Furthermore, DDAT WGS data quality was higher for all FFPE samples tested compared to data produced using a standard WGS library preparation method. Therefore, the DDAT method has potential to unlock WGS data from DNA previously considered impossible to sequence, broadening opportunities to understand the role of genetics in health and disease.
Project description:Chlamydia trachomatis is a pathogen of worldwide importance, causing more than 100 million cases of sexually transmitted infections annually. Whole-genome sequencing is a powerful high resolution tool that can be used to generate accurate data on bacterial population structure, phylogeography and mutations associated with antimicrobial resistance. The objective of this study was to perform whole-genome enrichment and sequencing of C. trachomatis directly from clinical samples.C. trachomatis positive samples comprising seven vaginal swabs and three urine samples were sequenced without prior in vitro culture in addition to nine cultured C. trachomatis samples, representing different serovars. A custom capture RNA bait set, that captures all known diversity amongst C. trachomatis genomes, was used in a whole-genome enrichment step during library preparation to enrich for C. trachomatis DNA. All samples were sequenced on the MiSeq platform.Full length C. trachomatis genomes (>95-100% coverage of a reference genome) were successfully generated for eight of ten clinical samples and for all cultured samples. The proportion of reads mapping to C. trachomatis and the mean read depth across each genome were strongly linked to the number of bacterial copies within the original sample. Phylogenetic analysis confirmed the known population structure and the data showed potential for identification of minority variants and mutations associated with antimicrobial resistance. The sensitivity of the method was >10-fold higher than other reported methodologies.The combination of whole-genome enrichment and deep sequencing has proven to be a non-mutagenic approach, capturing all known variation found within C. trachomatis genomes. The method is a consistent and sensitive tool that enables rapid whole-genome sequencing of C. trachomatis directly from clinical samples and has the potential to be adapted to other pathogens with a similar clonal nature.
Project description:Querying cancer genomes at single-cell resolution is expected to provide a powerful framework to understand in detail the dynamics of cancer evolution. However, given the high costs currently associated with single-cell sequencing, together with the inevitable technical noise arising from single-cell genome amplification, cost-effective strategies that maximize the quality of single-cell data are critically needed. Taking advantage of previously published single-cell whole-genome and whole-exome cancer datasets, we studied the impact of sequencing depth and sampling effort towards single-cell variant detection.Five single-cell whole-genome and whole-exome cancer datasets were independently downscaled to 25, 10, 5, and 1× sequencing depth. For each depth level, ten technical replicates were generated, resulting in a total of 6280 single-cell BAM files. The sensitivity of variant detection, including structural and driver mutations, genotyping, clonal inference, and phylogenetic reconstruction to sequencing depth was evaluated using recent tools specifically designed for single-cell data.Altogether, our results suggest that for relatively large sample sizes (25 or more cells) sequencing single tumor cells at depths >?5× does not drastically improve somatic variant discovery, characterization of clonal genotypes, or estimation of single-cell phylogenies.We suggest that sequencing multiple individual tumor cells at a modest depth represents an effective alternative to explore the mutational landscape and clonal evolutionary patterns of cancer genomes.
Project description:In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. AVAILABILITY:https://github.com/parklab/NGSCheckMate.
Project description:Whole exome sequencing of 5 HCLc tumor-germline pairs. Genomic DNA from HCLc tumor cells and T-cells for germline was used. Whole exome enrichment was performed with either Agilent SureSelect (50Mb, samples S3G/T, S5G/T, S9G/T) or Roche Nimblegen (44.1Mb, samples S4G/T and S6G/T). The resulting exome libraries were sequenced on the Illumina HiSeq platform with paired-end 100bp reads to an average depth of 120-134x. Bam files were generated using NovoalignMPI (v3.0) to align the raw fastq files to the reference genome sequence (hg19) and picard tools (v1.34) to flag duplicate reads (optical or pcr), unmapped reads, reads mapping to more than one location, and reads failing vendor QC.
Project description:In forensic casework, compromised samples often possess limited or degraded nuclear DNA, rendering mitochondrial DNA a more feasible option for forensic DNA analyses. The emergence of massively parallel sequencing (MPS) has enabled the recovery of extensive sequence information from very low quantities of DNA. We have developed a multiplex PCR method that amplifies the complete mitochondrial genome in a range of forensically relevant samples including single cells, cremated remains, bone, maggot and hairs isolated from dust bunnies. Following library preparation, MPS yields complete or nearly complete mitochondrial genome coverage for all samples. To confirm concordance between sample types and between sequencing platforms, we compared sequencing results from hair and buccal swabs from two references. Low initial DNA input into the multiplex PCR allows for conservation of precious DNA while MPS maximizes recovery of genetic information.
Project description:Copy number variants are duplications and deletions of the genome that play an important role in phenotypic changes and human disease. Many software applications have been developed to detect copy number variants using either whole-genome sequencing or whole-exome sequencing data. However, there is poor agreement in the results from these applications. Simulated datasets containing copy number variants allow comprehensive comparisons of the operating characteristics of existing and novel copy number variant detection methods. Several software applications have been developed to simulate copy number variants and other structural variants in whole-genome sequencing data. However, none of the applications reliably simulate copy number variants in whole-exome sequencing data. We have developed and tested Simulator of Exome Copy Number Variants (SECNVs), a fast, robust and customizable software application for simulating copy number variants and whole-exome sequences from a reference genome. SECNVs is easy to install, implements a wide range of commands to customize simulations, can output multiple samples at once, and incorporates a pipeline to output rearranged genomes, short reads and BAM files in a single command. Variants generated by SECNVs are detected with high sensitivity and precision by tools commonly used to detect copy number variants. SECNVs is publicly available at https://github.com/YJulyXing/SECNVs.