Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

MysteryMaster_a new demultiplexer for long read sequencing technology

ABSTRACT: MysteryMaster_a new demultiplexer for long read sequencing technology

PROVIDER: PRJEB85612 | ENA |

REPOSITORIES: ENA

ACCESS DATA

Json Xml

Similar Datasets

Novel splicing and open reading frames revealed by long-read direct RNA sequencing of adenovirus transcripts

Project description:Adenovirus is a common human pathogen that relies on host cell processes for transcription and processing of viral RNA and protein production. Although adenoviral promoters, splice junctions, and cleavage and polyadenylation sites have been characterized using low-throughput biochemical techniques or short read cDNA-based sequencing, these technologies do not fully capture the complexity of the adenoviral transcriptome. By combining Illumina short-read and nanopore long-read direct RNA sequencing approaches, we mapped transcription start sites and cleavage and polyadenylation sites across the adenovirus genome. In addition to confirming the known canonical viral early and late RNA cassettes, our analysis of splice junctions within long RNA reads revealed an additional 35 novel viral transcripts. These RNAs include fourteen new splice junctions which lead to expression of canonical open reading frames (ORF), six novel ORF-containing transcripts, and fifteen transcripts encoding for messages that potentially alter protein functions through truncations or fusion of canonical ORFs. In addition, we also detect RNAs that bypass canonical cleavage sites and generate potential chimeric proteins by linking separate gene transcription units. Of these, an evolutionary conserved protein was detected containing the N-terminus of E4orf6 fused to the downstream DBP/E2A ORF. Loss of this novel protein, E4orf6/DBP, was associated with aberrant viral replication center morphology and poor viral spread. Our work highlights how long-read sequencing technologies can reveal further complexity within viral transcriptomes.

2022-08-30 | PXD034464 | Pride

Assembly of the Arabidopsis (Col-0) centromeres using Oxford Nanopore Technology long-read DNA sequencing.

Project description:Nanopore long-read sequencing was used to generate a complete assembly of the Arabidopsis centromeres.

2021-04-02 | E-MTAB-10272 | biostudies-arrayexpress

Using long-read CAGE sequencing to profile cryptic-promoter derived transcripts and their contribution to the immunopeptidome

Project description:Recent studies have demonstrated that the non-coding genome can produce unannotated proteins as antigens that induce immune response. One major source of this activity is the aberrant epigenetic reactivation of transposable elements (TEs). In tumors, TEs often provide cryptic or alternate promoters, which can generate transcripts that encode tumor-specific unannotated proteins. Thus, TE-derived transcripts have the potential to produce tumor-specific, but recurrent, antigens shared among many tumors. Identification of TE-derived tumor antigens holds the promise to improve cancer immunotherapy approaches; however, current genomics and computational tools are not optimized for their detection. Here we combined CAGE technology with full-length long-read transcriptome sequencing (Long-Read CAGE, or LRCAGE) and developed a suite of computational tools to significantly improve immunopeptidome detection by incorporating TE-derived and other tumor transcripts into the proteome database. By applying our methods to human lung cancer cell line H1299 data, we demonstrated that long-read technology significantly improves mapping of promoters with low mappability scores and LRCAGE guarantees accurate construction of uncharacterized 5’ transcript structure. Unannotated peptides predicted from newly characterized transcripts were readily detectable in whole cell lysate mass-spectrometry data. Incorporating unannotated peptides into the proteome database enabled us to detect non-canonical antigens in HLA-pulldown LC-MS/MS data. At last, we showed that epigenetic treatment increased the number of non-canonical antigens, particularly those encoded by TE-derived transcripts, which might expand the pool of targetable antigens for cancers with low mutational burden.

2023-09-23 | PXD040265 | Pride

Pk long read sequencing

Project description:Plasmodium knowlesi long read sequencing

| PRJEB19298 | ENA

TEQUILA-seq: a versatile and low-cost method for targeted long-read RNA sequencing

Project description:Long-read RNA sequencing (RNA-seq) is a powerful technology for transcriptome analysis, but the relatively low throughput of current long-read sequencing platforms limits transcript coverage. We present TEQUILA-seq, a versatile, easy-to-implement, and low-cost method for targeted long-read RNA-seq. TEQUILA-seq can be broadly used for targeted sequencing of full-length transcripts in diverse biomedical research settings.

2023-06-06 | GSE213984 | GEO

A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification

Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each technology has its distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than annotated ones. The TALON pipeline for technology-agnostic, long-read transcriptome discovery and quantification tracks both known and novel transcript models as well as expression levels across datasets for both simple studies and larger projects such as ENCODE that seek to decode transcriptional regulation in the human and mouse genomes to predict more accurate expression levels of genes and transcripts than possible with short-reads alone.

2019-06-15 | GSE132766 | GEO

A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification

Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each one displayed distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than with annotated ones. These data show that TALON is a technology-agnostic long-read transcriptome discovery and quantification pipeline capable of tracking both known and novel transcript models, as well as their expression levels, across datasets for both simple studies and in larger projects. These properties will enable TALON users to move beyond the limitations of short-read data to perform isoform discovery and quantification in a uniform manner on existing and future long-read platforms.

2020-03-18 | GSE147118 | GEO

Long-read cDNA Sequencing of Transposable Element-Activated Arabidopsis Mutants

Project description:We have used the genetic resources of Arabidopsis thaliana to generate mutant lines that have reactivated TE expression. We used these lines with long-read Oxford Nanopore sequencing technology to capture Transposable Element (TE) mRNAs for TE transcript annotation.

2020-07-17 | GSE145066 | GEO

High-resolution transcriptome analysis with long-read RNA sequencing

Project description:Ongoing improvements to next generation sequencing technologies are leading to longer sequencing read lengths, but a thorough understanding of the impact of longer reads on RNA sequencing analyses is lacking. To address this issue, we generated and compared two RNA sequencing datasets of differing read lengths -- 2x75 bp (L75) and 2x262 bp (L262) -- and investigated the impact of read length on various aspects of analysis, including the performance of currently available read-mapping tools, gene and transcript quantification, and detection of allele-specific expression patterns. Our results indicate that, while the scalability of read-mapping tools and the cost-effectiveness of long read protocol is an issue that requires further attention, longer reads enable more accurate quantification of diverse aspects of gene expression, including individual-specific patterns of allele-specific expression and alternative splicing. Two RNA-Seq datasets of differing read lengths (2x262 bp and 2x75 bp)

2014-09-25 | E-GEOD-57862 | biostudies-arrayexpress

Long read sequencing of reporter plasmids and RNA

Project description:These experiments use a barcoded pool of reporter transcripts, each of which encode the same mScarlet-PPIG_LCD fusion protein, but using different degrees of GA-multivalency via codon bias, and containing a different number of constitutive introns. In order to be able to perform experiments using this pool, it was necessary to perform long-read sequencing of the plasmid pool to relate the barcodes in the 3' ends of the reporter to their gene structure. Therefore, we performed long-read sequencing of the plasmid pool (both the original pool used for transfection and the ePB plasmid used for PiggyBac integration). Furthermore, to determine the splicing patterns of the reporter genes, we transfected the plasmid pool into HeLa cells for 16 hours, then performed targeted long-read sequencing of the reporter plasmids via RT-PCR. Note: the Nanopore adapter ligation strategy means that reads can come in either orientation. To determine the gene architectures and barcodes, we used fuzzy string matching. First we matched to various fixed sequences throughout the reporter transcripts to determine the orientation of the read and that the read spanned the full length of the transcript. Then we used the same string matching strategy to detect the presence of the different intronic or exonic sequences - the gene architecture. Then we extracted the associated unique plasmid barcode associated with that gene architecture. Example reporter sequences can be found here: https://benchling.com/faraway/f_/kXCfddtQ-public-reporter-plasmid-maps/ or alternatively, in Supplemental Table 2 of the bioRxiv submission here: https://www.biorxiv.org/content/10.1101/2023.08.21.554177v1.supplementary-material

2025-07-29 | E-MTAB-13330 | biostudies-arrayexpress

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data