Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Merging short and stranded long reads improves transcript assembly

ABSTRACT: New tools for improved long-read transcript assembly and coalescence with its short-read counterpart are required. Using our short- and long-read measurements from different cell lines with spiked-in standards, we systematically compared key parameters and biases in the read alignment and assembly of transcripts. We report a cDNA synthesis artifact in long-read datasets that impacts the identity and quantitation of assembled transcripts. We developed a computational pipeline to strand long-read cDNA libraries that markedly improves assembly of transcripts from long-reads. Incorporating stranded long-reads in a new hybrid assembly approach, we demonstrate its efficacy for improved characterization of challenging lncRNA transcripts. Our workflow can be applied to a wide range of transcriptomics datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.

ORGANISM(S): Mus musculus

PROVIDER: GSE215357 | GEO | 2023/10/14

REPOSITORIES: GEO

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Merging short and stranded long reads improves transcript assembly

Project description:New tools for improved long-read transcript assembly and coalescence with its short-read counterpart are required. Using our short- and long-read measurements from different cell lines with spiked-in standards, we systematically compared key parameters and biases in the read alignment and assembly of transcripts. We report a cDNA synthesis artifact in long-read datasets that impacts the identity and quantitation of assembled transcripts. We developed a computational pipeline to strand long-read cDNA libraries that markedly improves assembly of transcripts from long-reads. Incorporating stranded long-reads in a new hybrid assembly approach, we demonstrate its efficacy for improved characterization of challenging lncRNA transcripts. Our workflow can be applied to a wide range of transcriptomics datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.

2023-10-14 | GSE215355 | GEO

Novel splicing and open reading frames revealed by long-read direct RNA sequencing of adenovirus transcripts

Project description:Adenovirus is a common human pathogen that relies on host cell processes for transcription and processing of viral RNA and protein production. Although adenoviral promoters, splice junctions, and cleavage and polyadenylation sites have been characterized using low-throughput biochemical techniques or short read cDNA-based sequencing, these technologies do not fully capture the complexity of the adenoviral transcriptome. By combining Illumina short-read and nanopore long-read direct RNA sequencing approaches, we mapped transcription start sites and cleavage and polyadenylation sites across the adenovirus genome. In addition to confirming the known canonical viral early and late RNA cassettes, our analysis of splice junctions within long RNA reads revealed an additional 35 novel viral transcripts. These RNAs include fourteen new splice junctions which lead to expression of canonical open reading frames (ORF), six novel ORF-containing transcripts, and fifteen transcripts encoding for messages that potentially alter protein functions through truncations or fusion of canonical ORFs. In addition, we also detect RNAs that bypass canonical cleavage sites and generate potential chimeric proteins by linking separate gene transcription units. Of these, an evolutionary conserved protein was detected containing the N-terminus of E4orf6 fused to the downstream DBP/E2A ORF. Loss of this novel protein, E4orf6/DBP, was associated with aberrant viral replication center morphology and poor viral spread. Our work highlights how long-read sequencing technologies can reveal further complexity within viral transcriptomes.

2022-08-30 | PXD034464 | Pride

Precise Transcript Reconstruction with End-Guided Assembly

Project description:Accurate annotation of transcript isoforms is crucial to understand gene functions, but automated methods for reconstructing full-length transcripts from RNA sequencing (RNA-seq) data remain imprecise. We developed Bookend, a software package for transcript assembly that incorporates data from different RNA-seq techniques, with a focus on identifying and utilizing RNA 5′ and 3′ ends. Through end-guided assembly with Bookend we demonstrate that correct modeling of transcript start and end sites is essential for precise transcript assembly. Furthermore, we discovered that utilization of end-labeled reads present in full-length single-cell RNA-seq (scRNA-seq) datasets dramatically improves the precision of transcript assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq datasets from Arabidopsis, as well as meta-assembly of RNA-seq from single mouse embryonic stem cells (mESCs) can produce end-to-end transcript annotations of comparable quality to reference annotations in these model organisms.

2022-01-08 | GSE189482 | GEO

HyDRA: a pipeline for integrating long- and short-read RNAseq data for custom transcriptome assembly

Project description:Short-read RNA sequencing (RNAseq) remains a cornerstone for transcriptome profiling, but is limited in reconstructing full-length transcripts and capturing transcript diversity. While long-read RNAseq spans entire transcripts and resolves complex structures, this technology is hindered by its high error rates. In parallel, noncoding RNA transcripts remain underrepresented in current references. Here, we present HyDRA (Hybrid de novo RNA Assembly), a pipeline that integrates the accuracy of short reads with the structural resolution of long reads to produce more complete de novo transcriptome assemblies. Benchmarking showed HyDRA to outperform existing methods by up to 40%. Using the HyDRA human ovarian metatranscriptome, we identified >50,000 high-confidence long noncoding RNAs, most of which have not been previously detected using traditional methods. Although long-read RNAseq is advancing, the vast availability of short reads ensures HyDRA’s ongoing role in capturing high-confidence, cell-type specific transcripts and advancing our understanding of transcriptomic complexity and the noncoding genome.

2026-04-06 | GSE327033 | GEO

Time-resolved proteomic profile of Amblyomma americanum tick saliva during feeding Running Title: Proteins in tick saliva every 24 h during feeding

Project description:Pioneering studies (PXD014844) have identified many interesting molecules by LC-MS/MS proteomics, but the protein databases used to assign mass spectra were based on short Illumina reads of the Amblyomma americanum transcriptome and may not have captured the diversity and complexity of longer transcripts. Here we apply long-read Pacific Bioscience technologies to complement the previously reported short-read Illumina transcriptome-based proteome in an effort to increase spectrum assignments. Our dataset reveals a small increase in assignable spectra to supplement previously released short-read transcriptome-based proteome.

2020-01-23 | PXD014844 | Pride

Proteomic and long-read transcriptomic analysis of Amblyomma americanum salivary gland lysate

Project description:Pioneering studies (PXD014844) have identified many interesting molecules in tick saliva by LC-MS/MS proteomics, but the protein databases used to assign mass spectra were based on short Illumina reads of the Amblyomma americanum transcriptome and may not have captured the diversity and complexity of longer transcripts. Here we apply long-read Pacific Bioscience technologies to complement the previously reported short-read Illumina transcriptome-based proteome in an effort to increase spectrum assignments. Our dataset reveals a small increase in assignable spectra to supplement the previously released short-read transcriptome-based proteome.

2022-05-31 | PXD033870 | Pride

Nanopore sequencing unveils the complexity of the murine brown adipose tissue transcriptome

Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.

2025-05-19 | GSE296616 | GEO

Nanopore sequencing unveils the complexity of the murine brown adipose tissue transcriptome: Illumina RNA-Seq of iBAT from cold treated mice [01_iBAT_illumina]

2023-09-18 | GSE212569 | GEO

Nanopore sequencing unveils the complexity of the murine brown adipose tissue transcriptome: Illumina RNA-Seq of eWAT from cold treated mice [05_eWAT_illumina]

2023-09-18 | GSE212573 | GEO

Nanopore sequencing unveils the complexity of the murine brown adipose tissue transcriptome: ONT direct cDNA-Seq of iBAT from cold treated mice [04_iBAT_cdna]

2023-09-18 | GSE212572 | GEO