Project description:This study benchmarks bulk and single-cell long-read RNA sequencing technologies in a human neuronal model of Fragile X syndrome. NGN2-induced neurons were generated from patient-derived iPSCs carrying a silenced FMR1 gene (FXS line E3) and an isogenic CRISPR-corrected rescue line (IsoB11) in which FMR1 expression is restored. These conditions provide a defined system to evaluate transcript detection and quantification across sequencing platforms. Bulk and single-cell RNA-seq datasets were generated using Illumina short-read sequencing and long-read sequencing from Pacific Biosciences (PB) and Oxford Nanopore Technologies (ONT). Single-cell libraries were prepared using the 10x Genomics Chromium platform. ERCC and SIRV spike-in controls were added to bulk samples to enable benchmarking of transcript quantification accuracy. Three biological replicates were sequenced for each condition. The dataset enables cross-platform comparisons of transcript detection, quantification methods, transcript length biases, and sequencing depth requirements for long-read transcriptomic analyses.
Project description:This study benchmarks bulk and single-cell long-read RNA sequencing technologies in a human neuronal model of Fragile X syndrome. NGN2-induced neurons were generated from patient-derived iPSCs carrying a silenced FMR1 gene (FXS line E3) and an isogenic CRISPR-corrected rescue line (IsoB11) in which FMR1 expression is restored. These conditions provide a defined system to evaluate transcript detection and quantification across sequencing platforms. Bulk and single-cell RNA-seq datasets were generated using Illumina short-read sequencing and long-read sequencing from Pacific Biosciences (PB) and Oxford Nanopore Technologies (ONT). Single-cell libraries were prepared using the 10x Genomics Chromium platform. ERCC and SIRV spike-in controls were added to bulk samples to enable benchmarking of transcript quantification accuracy. Three biological replicates were sequenced for each condition. The dataset enables cross-platform comparisons of transcript detection, quantification methods, transcript length biases, and sequencing depth requirements for long-read transcriptomic analyses.
Project description:Long-read RNA sequencing technologies offer unparalleled in- sights into transcriptomes by enabling full-length sequencing of RNA molecules, uncovering novel isoforms and alternative splicing events. While long-read sequencing platforms, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), have historically been associated with higher error rates, recent advancements in both platforms have significantly en- hanced read accuracy, broadening their applicability for tran- scriptomic studies. With the rapid evolution of sequencing protocols and bioin- formatics tools, the trade-offs between sequencing throughput, read length, accuracy, and cost present significant challenges in selecting the optimal approach. Systematic benchmarking studies that compare these options are crucial to inform fu- ture research directions. However, many existing benchmark- ing datasets with matched data across multiple platforms have limitations, including: 1) a lack of realistic biological replicates, which may restrict the generalisability of differential analysis results to real-world scenarios, and 2) the use of earlier sequenc- ing kits, which may not reflect the latest advancements in se- quencing technology, limiting their relevance for future studies that typically use newer sequencing protocols. Here we present LongBench, a comprehensive benchmarking dataset designed to fill these critical gaps. Derived from eight lung cancer cell lines with synthetic RNA spike-ins, LongBench includes bulk, single-cell, and single-nucleus RNA-seq data from three state-of-the-art long-read sequencing platforms — ONT PCR-cDNA, ONT direct RNA, PacBio Kinnex — alongside Il- lumina short-read data for robust cross-platform comparisons. The LongBench dataset is a valuable resource for benchmarking and improving sequencing protocols and bioinformatics tools. With the LongBench dataset we present a systematic evaluation of transcript capture, quantification, and differential expression analyses, examining the strengths and limitations of each se- quencing platform in various biological contexts, enabling re- searchers to make more informed decisions on platform and method selection.