Dataset Information

Caenorhabditis elegans

ABSTRACT: Benchmarking dataset for comparing noisy and accurate long-read sequencing technologies

PROVIDER: PRJNA896647 | ENA |

REPOSITORIES: ENA

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other
	SRR22137522_subreads.fastq.gz	Fastqsanger.gz
	SRR22137523_subreads.fastq.gz	Fastqsanger.gz

Items per page:

1 - 3 of 3

Similar Datasets

Project description:Long-read RNA sequencing technologies offer unparalleled in- sights into transcriptomes by enabling full-length sequencing of RNA molecules, uncovering novel isoforms and alternative splicing events. While long-read sequencing platforms, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), have historically been associated with higher error rates, recent advancements in both platforms have significantly en- hanced read accuracy, broadening their applicability for tran- scriptomic studies. With the rapid evolution of sequencing protocols and bioin- formatics tools, the trade-offs between sequencing throughput, read length, accuracy, and cost present significant challenges in selecting the optimal approach. Systematic benchmarking studies that compare these options are crucial to inform fu- ture research directions. However, many existing benchmark- ing datasets with matched data across multiple platforms have limitations, including: 1) a lack of realistic biological replicates, which may restrict the generalisability of differential analysis results to real-world scenarios, and 2) the use of earlier sequenc- ing kits, which may not reflect the latest advancements in se- quencing technology, limiting their relevance for future studies that typically use newer sequencing protocols. Here we present LongBench, a comprehensive benchmarking dataset designed to fill these critical gaps. Derived from eight lung cancer cell lines with synthetic RNA spike-ins, LongBench includes bulk, single-cell, and single-nucleus RNA-seq data from three state-of-the-art long-read sequencing platforms — ONT PCR-cDNA, ONT direct RNA, PacBio Kinnex — alongside Il- lumina short-read data for robust cross-platform comparisons. The LongBench dataset is a valuable resource for benchmarking and improving sequencing protocols and bioinformatics tools. With the LongBench dataset we present a systematic evaluation of transcript capture, quantification, and differential expression analyses, examining the strengths and limitations of each se- quencing platform in various biological contexts, enabling re- searchers to make more informed decisions on platform and method selection.

Dataset Information

Caenorhabditis elegans

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets