Project description:Long-read RNA sequencing technologies offer unparalleled in- sights into transcriptomes by enabling full-length sequencing of RNA molecules, uncovering novel isoforms and alternative splicing events. While long-read sequencing platforms, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), have historically been associated with higher error rates, recent advancements in both platforms have significantly en- hanced read accuracy, broadening their applicability for tran- scriptomic studies. With the rapid evolution of sequencing protocols and bioin- formatics tools, the trade-offs between sequencing throughput, read length, accuracy, and cost present significant challenges in selecting the optimal approach. Systematic benchmarking studies that compare these options are crucial to inform fu- ture research directions. However, many existing benchmark- ing datasets with matched data across multiple platforms have limitations, including: 1) a lack of realistic biological replicates, which may restrict the generalisability of differential analysis results to real-world scenarios, and 2) the use of earlier sequenc- ing kits, which may not reflect the latest advancements in se- quencing technology, limiting their relevance for future studies that typically use newer sequencing protocols. Here we present LongBench, a comprehensive benchmarking dataset designed to fill these critical gaps. Derived from eight lung cancer cell lines with synthetic RNA spike-ins, LongBench includes bulk, single-cell, and single-nucleus RNA-seq data from three state-of-the-art long-read sequencing platforms — ONT PCR-cDNA, ONT direct RNA, PacBio Kinnex — alongside Il- lumina short-read data for robust cross-platform comparisons. The LongBench dataset is a valuable resource for benchmarking and improving sequencing protocols and bioinformatics tools. With the LongBench dataset we present a systematic evaluation of transcript capture, quantification, and differential expression analyses, examining the strengths and limitations of each se- quencing platform in various biological contexts, enabling re- searchers to make more informed decisions on platform and method selection.
Project description:Droplet-based single-cell sequencing techniques have provided unprecedented insight into cellular heterogeneities within tissues. However, these approaches only allow for the measurement of the distal parts of a transcript following short-read sequencing. Therefore, splicing and sequence diversity information is lost for the majority of the transcript. The application of long-read Nanopore sequencing to droplet-based methods is challenging because of the low base-calling accuracy currently associated with Nanopore sequencing. Although several approaches that use additional short-read sequencing to error-correct the barcode and UMI sequences have been developed, these techniques are limited by the requirement to sequence a library using both short- and long-read sequencing. Here we introduce a novel approach termed single-cell Barcode UMI Correction sequencing (scBUC-seq) to efficiently error-correct barcode and UMI oligonucleotide sequences synthesized by using blocks of dimeric nucleotides. The method can be applied to correct both short-read and long-read sequencing, thereby allowing users to recover more reads per cell that permits direct single-cell Nanopore sequencing for the first time. We illustrate our method by using species-mixing experiments to evaluate barcode assignment accuracy and multiple myeloma cell lines to evaluate differential isoform usage and Ewing’s sarcoma cells to demonstrate Ig fusion transcript analysis.
Project description:Sample index hopping refers to the incorrect sample assignment of a demultiplexed sequencing read in a library pool. To enable benchmarking of methods for measurement of index hopping rate and removal of its artifacts in single-cell RNA-seq data, we developed a validation dataset consisting of a multiplexed library of two samples, in which the true sample of origin of most reads are known. The reads with known sample of origin provide the ground truth for measuring the performance of index hopping correcting methods.
Project description:Affinity capture of DNA methylation combined with high-throughput sequencing strikes a good balance between the high cost of whole genome bisulfite sequencing and the low coverage of methylation arrays. We present BayMeth, an empirical Bayes approach that uses a fully methylated control sample to transform observed read counts into regional methylation levels. In our model, inefficient capture can readily be distinguished from low methylation levels. BayMeth improves on existing methods, allows explicit modeling of copy number variation, and offers computationally-efficient analytical mean and variance estimators. BayMeth is available in the Repitools Bioconductor package. Benchmarking samples to compare MBD- and MeDIP-seq [GSE38679, GSE24546; PMID 21045081] datasets against 450k measurements