Dataset Information

ABSTRACT: Haplotype-aware error correction of Simplex nanopore reads

PROVIDER: PRJNA1112538 | ENA |

REPOSITORIES: ENA

ACCESS DATA

Dataset's files

Source:

			Action	DRS
	SRR29061597_1.fastq.gz	Fastqsanger.gz

Items per page:

1 - 1 of 1

Similar Datasets

Project description:Long-read sequencing has become a powerful tool for alternative splicing analysis. However, technical and computational challenges have limited our ability to couple long-read sequencing with single cell and spatial barcoding to explore alternative splicing in the single cell and spatial setting. Though Nanopore-based long reads sequencing are widelyhave been adopted applied to explore single cell alternative and spatially barcoded librariessplicing in recent research, there still exist technical issues have problems which could bias the hindered accurate single cell isoform-level quantification, which are not well addressed in such settings. First, Tthe relatively higher sequencing error of Nanopore long reads, despite the recent improvements, has limited the accuracy ofhinder cell barcode and unique molecular identifier (UMI) recovery, a necessary first step in the analysis of single cell/spatial sequencing data. Then Rread truncation and mapping errors, the latter exacerbated by the higher sequencing error rates, further leads to the false detection of spurious new isoformsdegrade quantification accuracy. We show that these technical issues persist despite the recent improvements in long read sequencing accuracy. Beyond the initial data pre-processing, in downstream analysis we are lacking a statistical framework to quantify splicing variation within and between cells/spots. In light of these multiple challenges, we developed Longcell, a statistical framework and computational pipeline for isoform quantification using single cell and spatial spot barcoded Nanopore long read sequencing data. Longcell performs computationally efficient cell/spot barcode extraction, UMI recovery, and UMI-based truncation- and mapping-error correction. Through a statistical model that accounts for varying read coverage across cells/spots, Longcell rigorously quantifies the level of inter-cell/spot versus intra-cell/ spot diversity in exon-usage and detects changes in splicing distributions between cell populations. Applying Longcell to single cell long-read data from multiple contexts, we found that intra-cell splicing heterogeneity, where multiple isoforms co-exist within the same cell, is ubiquitous for highly expressed genes. On matched single cell and Visium long read sequencing for a tissue of colorectal cancer metastasis to the liver, Longcell found concordant signals between the single cell and spatial data modalities. On Visium long read sequencing data for multiple tissues, Longcell allows accurate identification of spatial isoform switching. Finally, on a perturbation experiment for 9 splicing factors, Longcell identified regulatory targets that are validated by targeted sequencing.

Dataset Information

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets