Project description:Background: Whole exome sequencing (WES) has been proven to serve as a valuable basis for various applications such as variant calling and copy number variation (CNV) analyses. For those analyses the read coverage should be optimally balanced throughout protein coding regions at sufficient read depth. Unfortunately, WES is known for its uneven coverage within coding regions due to GC-rich regions or off-target enrichment. Results: In order to examine the irregularities of WES within genes, we applied Agilent SureSelectXT exome capture on human samples and sequenced these via Illumina in 2x101 paired-end mode. As we suspected the sequenced insert length to be crucial in the uneven coverage of exome captured samples, we sheared 12 genomic DNA samples to two different DNA insert size lengths, namely 130 and 170 bp. Interestingly, although mean coverages of target regions were clearly higher in samples of 130 bp insert length, the level of evenness was more pronounced in 170 bp samples. Moreover, merging overlapping paired-end reads revealed a positive effect on evenness indicating overlapping reads as another reason for the unevenness. In addition, mutation analysis on a subset of the samples was performed. In these isogenic subclones almost twofold mutations were failed in the 130 bp samples when compared to the 170 bp samples. Visual inspection of the discarded mutation sites exposed low coverages at the sites embedded in high amplitudes of coverage depth in the affected region. Conclusions: Producing longer insert reads could be a good strategy to achieve better uniform read coverage in coding regions and hereby enhancing the effective sequencing yield to provide an improved basis for further variant calling and CNV analyses.
Project description:<p>In this study, we hypothesize that shallow long insert whole genome sequencing (LI-WGS) increases our power for detecting breakpoints compared to shallow short insert WGS. We performed a priori analyses to demonstrate the benefits of LI-WGS, developed a long insert library preparation protocol based off Illumina's protocol, and compared LI-WGS against short insert WGS on test samples. We then used long insert WGS to identify translocations and copy number changes in tumor and germline samples collected from cancer patients with different malignancies.</p>
Project description:In principle, whole-genome sequencing (WGS) of the human genome even at low coverage offers higher resolution for genomic copy number variation (CNV) detection compared to array-based technologies, which is currently the first-tier approach in clinical cytogenetics. There are, however, obstacles in replacing array-based CNV detection with that of low-coverage WGS such as cost, turnaround time, and lack of systematic performance comparisons. With technological advances in WGS in terms of library preparation, instrument platforms, and data analysis algorithms, obstacles imposed by cost and turnaround time are fading. However, a systematic performance comparison between array and low-coverage WGS-based CNV detection has yet to be performed. Here, we compared the CNV detection capabilities between WGS (short-insert, 3kb-, and 5kb-mate-pair libraries) at 1X, 3X, and 5X coverages and standardly used high-resolution arrays in the genome of 1000-Genomes-Project CEU genome NA12878. CNV detection was performed using standard analysis methods, and the results were then compared to a list of Gold Standard NA12878 CNVs distilled from the 1000-Genomes Project. Overall, low-coverage WGS is able to detect drastically more (approximately 5 fold more on average) Gold Standard CNVs compared to arrays and is accompanied with fewer CNV calls without secondary validation. Furthermore, we also show that WGS (at ≥1X coverage) is able to detect all seven validated deletions larger than 100 kb in the NA12878 genome whereas only one of such deletions is detected in most arrays. Finally, we show that the much larger 15 Mbp Cri-du-chat deletion can be clearly seen at even 1X coverage from short-insert WGS.
Project description:Deep sequencing of transcriptomes allows quantitative and qualitative analysis of many RNA species in a sample, with parallel comparison of expression levels, splicing variants, natural antisense transcripts, RNA editing and transcriptional start and stop sites the ideal goal. By computational modeling, we show how libraries of multiple insert sizes combined with strand-specific, paired-end (SS-PE) sequencing can increase the information gained on alternative splicing, especially in higher eukaryotes. Despite the benefits of gaining SS-PE data with paired ends of varying distance, the standard Illumina protocol allows only non-strand-specific, paired-end sequencing with a single insert size. Here, we modify the Illumina RNA ligation protocol to allow SS-PE sequencing by using a custom pre-adenylated 3’ adaptor. We generate parallel libraries with differing insert sizes to aid deconvolution of alternative splicing events and to characterize the extent and distribution of natural antisense transcription in C. elegans. Despite stringent requirements for detection of alternative splicing, our data increases the number of intron retention and exon skipping events annotated in the Wormbase genome annotations by 127 % and 121 %, respectively. We show that parallel libraries with a range of insert sizes increase transcriptomic information gained by sequencing and that by current established benchmarks our protocol gives competitive results with respect to library quality.
Project description:Ongoing improvements to next generation sequencing technologies are leading to longer sequencing read lengths, but a thorough understanding of the impact of longer reads on RNA sequencing analyses is lacking. To address this issue, we generated and compared two RNA sequencing datasets of differing read lengths -- 2x75 bp (L75) and 2x262 bp (L262) -- and investigated the impact of read length on various aspects of analysis, including the performance of currently available read-mapping tools, gene and transcript quantification, and detection of allele-specific expression patterns. Our results indicate that, while the scalability of read-mapping tools and the cost-effectiveness of long read protocol is an issue that requires further attention, longer reads enable more accurate quantification of diverse aspects of gene expression, including individual-specific patterns of allele-specific expression and alternative splicing.
Project description:We report a method for precisely stenciling the structure of individual chromatin fibers onto their composite DNA templates using non-specific DNA N6-adenine methyltransferases. Single-molecule long-read sequencing using PacBio of these chromatin stencils enables nucleotide-resolution readout of the primary architecture of multi-kilobase chromatin fibers (Fiber-seq).
Project description:We report a method for precisely stenciling the structure of individual chromatin fibers onto their composite DNA templates using non-specific DNA N6-adenine methyltransferases. Single-molecule long-read sequencing using PacBio of these chromatin stencils enables nucleotide-resolution readout of the primary architecture of multi-kilobase chromatin fibers (Fiber-seq).
Project description:We report a method for precisely stenciling the structure of individual chromatin fibers onto their composite DNA templates using non-specific DNA N6-adenine methyltransferases. Single-molecule long-read sequencing using PacBio of these chromatin stencils enables nucleotide-resolution readout of the primary architecture of multi-kilobase chromatin fibers (Fiber-seq).
Project description:We report a method for precisely stenciling the structure of individual chromatin fibers onto their composite DNA templates using non-specific DNA N6-adenine methyltransferases. Single-molecule long-read sequencing using PacBio of these chromatin stencils enables nucleotide-resolution readout of the primary architecture of multi-kilobase chromatin fibers (Fiber-seq).