Project description:This study benchmarks bulk and single-cell long-read RNA sequencing technologies in a human neuronal model of Fragile X syndrome. NGN2-induced neurons were generated from patient-derived iPSCs carrying a silenced FMR1 gene (FXS line E3) and an isogenic CRISPR-corrected rescue line (IsoB11) in which FMR1 expression is restored. These conditions provide a defined system to evaluate transcript detection and quantification across sequencing platforms. Bulk and single-cell RNA-seq datasets were generated using Illumina short-read sequencing and long-read sequencing from Pacific Biosciences (PB) and Oxford Nanopore Technologies (ONT). Single-cell libraries were prepared using the 10x Genomics Chromium platform. ERCC and SIRV spike-in controls were added to bulk samples to enable benchmarking of transcript quantification accuracy. Three biological replicates were sequenced for each condition. The dataset enables cross-platform comparisons of transcript detection, quantification methods, transcript length biases, and sequencing depth requirements for long-read transcriptomic analyses.
Project description:This study benchmarks bulk and single-cell long-read RNA sequencing technologies in a human neuronal model of Fragile X syndrome. NGN2-induced neurons were generated from patient-derived iPSCs carrying a silenced FMR1 gene (FXS line E3) and an isogenic CRISPR-corrected rescue line (IsoB11) in which FMR1 expression is restored. These conditions provide a defined system to evaluate transcript detection and quantification across sequencing platforms. Bulk and single-cell RNA-seq datasets were generated using Illumina short-read sequencing and long-read sequencing from Pacific Biosciences (PB) and Oxford Nanopore Technologies (ONT). Single-cell libraries were prepared using the 10x Genomics Chromium platform. ERCC and SIRV spike-in controls were added to bulk samples to enable benchmarking of transcript quantification accuracy. Three biological replicates were sequenced for each condition. The dataset enables cross-platform comparisons of transcript detection, quantification methods, transcript length biases, and sequencing depth requirements for long-read transcriptomic analyses.
Project description:Short-read RNA sequencing (RNAseq) remains a cornerstone for transcriptome profiling, but is limited in reconstructing full-length transcripts and capturing transcript diversity. While long-read RNAseq spans entire transcripts and resolves complex structures, this technology is hindered by its high error rates. In parallel, noncoding RNA transcripts remain underrepresented in current references. Here, we present HyDRA (Hybrid de novo RNA Assembly), a pipeline that integrates the accuracy of short reads with the structural resolution of long reads to produce more complete de novo transcriptome assemblies. Benchmarking showed HyDRA to outperform existing methods by up to 40%. Using the HyDRA human ovarian metatranscriptome, we identified >50,000 high-confidence long noncoding RNAs, most of which have not been previously detected using traditional methods. Although long-read RNAseq is advancing, the vast availability of short reads ensures HyDRA’s ongoing role in capturing high-confidence, cell-type specific transcripts and advancing our understanding of transcriptomic complexity and the noncoding genome.
Project description:Long-read RNA sequencing technologies offer unparalleled in- sights into transcriptomes by enabling full-length sequencing of RNA molecules, uncovering novel isoforms and alternative splicing events. While long-read sequencing platforms, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), have historically been associated with higher error rates, recent advancements in both platforms have significantly en- hanced read accuracy, broadening their applicability for tran- scriptomic studies. With the rapid evolution of sequencing protocols and bioin- formatics tools, the trade-offs between sequencing throughput, read length, accuracy, and cost present significant challenges in selecting the optimal approach. Systematic benchmarking studies that compare these options are crucial to inform fu- ture research directions. However, many existing benchmark- ing datasets with matched data across multiple platforms have limitations, including: 1) a lack of realistic biological replicates, which may restrict the generalisability of differential analysis results to real-world scenarios, and 2) the use of earlier sequenc- ing kits, which may not reflect the latest advancements in se- quencing technology, limiting their relevance for future studies that typically use newer sequencing protocols. Here we present LongBench, a comprehensive benchmarking dataset designed to fill these critical gaps. Derived from eight lung cancer cell lines with synthetic RNA spike-ins, LongBench includes bulk, single-cell, and single-nucleus RNA-seq data from three state-of-the-art long-read sequencing platforms — ONT PCR-cDNA, ONT direct RNA, PacBio Kinnex — alongside Il- lumina short-read data for robust cross-platform comparisons. The LongBench dataset is a valuable resource for benchmarking and improving sequencing protocols and bioinformatics tools. With the LongBench dataset we present a systematic evaluation of transcript capture, quantification, and differential expression analyses, examining the strengths and limitations of each se- quencing platform in various biological contexts, enabling re- searchers to make more informed decisions on platform and method selection.
Project description:Droplet-based single-cell sequencing techniques have provided unprecedented insight into cellular heterogeneities within tissues. However, these approaches only allow for the measurement of the distal parts of a transcript following short-read sequencing. Therefore, splicing and sequence diversity information is lost for the majority of the transcript. The application of long-read Nanopore sequencing to droplet-based methods is challenging because of the low base-calling accuracy currently associated with Nanopore sequencing. Although several approaches that use additional short-read sequencing to error-correct the barcode and UMI sequences have been developed, these techniques are limited by the requirement to sequence a library using both short- and long-read sequencing. Here we introduce a novel approach termed single-cell Barcode UMI Correction sequencing (scBUC-seq) to efficiently error-correct barcode and UMI oligonucleotide sequences synthesized by using blocks of dimeric nucleotides. The method can be applied to correct both short-read and long-read sequencing, thereby allowing users to recover more reads per cell that permits direct single-cell Nanopore sequencing for the first time. We illustrate our method by using species-mixing experiments to evaluate barcode assignment accuracy and multiple myeloma cell lines to evaluate differential isoform usage and Ewing’s sarcoma cells to demonstrate Ig fusion transcript analysis.