Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

A long-read RNA-seq approach to identify novel transcripts of very large genes

ABSTRACT: RNA-seq is widely used for studying gene expression, but commonly used sequencing platforms produce short reads that only span up to two exon-junctions per read. This makes it difficult to accurately determine the composition and phasing of exons within transcripts. Although long-read sequencing improves this issue, it is not amenable to precise quantitation, which limits its utility for differential expression studies. We used long-read isoform sequencing combined with a novel analysis approach to compare alternative splicing of large, repetitive structural genes in muscles. Analysis of muscle structural genes that produce medium (Nrap - 5kb), large (Nebulin - 22 kb) and very-large (Titin - 106 kb) transcripts in cardiac muscle, and fast and slow skeletal muscles identified unannotated exons for each of these ubiquitous muscle genes. This also identified differential exon usage and phasing for these genes between the different muscle types. By mapping the in-phase transcript structures to known annotations, we also identified and quantified previously unannotated transcripts. Results were confirmed by endpoint PCR and Sanger sequencing, which revealed muscle-type specific differential expression of these novel transcripts. The improved transcript identification and quantification demonstrated by our approach removes previous impediments to studies aimed at quantitative differential expression of ultra-long transcripts.

ORGANISM(S): Mus musculus

PROVIDER: GSE138362 | GEO | 2020/05/17

REPOSITORIES: GEO

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Using long-read CAGE sequencing to profile cryptic-promoter derived transcripts and their contribution to the immunopeptidome

Project description:Recent studies have demonstrated that the non-coding genome can produce unannotated proteins as antigens that induce immune response. One major source of this activity is the aberrant epigenetic reactivation of transposable elements (TEs). In tumors, TEs often provide cryptic or alternate promoters, which can generate transcripts that encode tumor-specific unannotated proteins. Thus, TE-derived transcripts have the potential to produce tumor-specific, but recurrent, antigens shared among many tumors. Identification of TE-derived tumor antigens holds the promise to improve cancer immunotherapy approaches; however, current genomics and computational tools are not optimized for their detection. Here we combined CAGE technology with full-length long-read transcriptome sequencing (Long-Read CAGE, or LRCAGE) and developed a suite of computational tools to significantly improve immunopeptidome detection by incorporating TE-derived and other tumor transcripts into the proteome database. By applying our methods to human lung cancer cell line H1299 data, we demonstrated that long-read technology significantly improves mapping of promoters with low mappability scores and LRCAGE guarantees accurate construction of uncharacterized 5’ transcript structure. Unannotated peptides predicted from newly characterized transcripts were readily detectable in whole cell lysate mass-spectrometry data. Incorporating unannotated peptides into the proteome database enabled us to detect non-canonical antigens in HLA-pulldown LC-MS/MS data. At last, we showed that epigenetic treatment increased the number of non-canonical antigens, particularly those encoded by TE-derived transcripts, which might expand the pool of targetable antigens for cancers with low mutational burden.

2023-09-23 | PXD040265 | Pride

Single-molecule sequencing of maize and sorghum developmental tissues

Project description:In this study, we compared the transcriptome map of maize and sorghum using PacBio single-molecule long-read sequencing from multiple matched tissues in each species. Maize and sorghum are both important crops with similar overall plant architectures, but they have key differences, especially in regard to their inflorescences. To better understand these two organisms at the molecular level, we compared the transcriptional profiles of both protein-coding and non-coding transcripts in matched tissues using large-scale single-molecule sequencing from 130 RSII cells and 5 Sequel cells, as well as deep short-read RNA sequencing. The use of multiple size-fractionated libraries (<1 kb, 12 kb, 23 kb, 35 kb, and >5 kb) enhanced our capture of non-redundant transcripts in these tissues.

2018-04-05 | E-MTAB-5957 | biostudies-arrayexpress

Disease-associated genetic variants can cause missense effects in tissue-specific protein isoforms

Project description:Genetic variants can cause protein-coding mutations that result in disease. Variants are typically interpreted using the reference transcript for a gene. However, most human multi-exon genes encode alternative isoforms. Here, we show that coding exons in alternative isoforms harbour more population variants than exons of reference isoforms, consistent with their reduced evolutionary constraint, and that these variants are more likely to cause nonsynonymous coding mutations. Common and rare disease-associated variants mapping to alternative transcripts can lead to amino acid substitutions predicted to be structurally damaging in the corresponding protein isoform. The alternative transcripts to which disease-associated variants map demonstrate high tissue-specific expression, with many unannotated in reference human genomes, revealed only by long-read RNA-sequencing. As an example, we report an unannotated alternative transcript of the inflammasome regulator DPP9 that is lung epithelium-specific and which harbours a common genetic variant associated with severe COVID-19 and lung fibrosis. The variant causes a p.Leu8Pro missense mutation in an alternative first exon, predicted to disrupt the encoded alpha helix. These findings highlight the importance of considering alternative isoforms, their tissue-specific expression, and full-length transcripts in variant interpretation, with implications for uncovering underappreciated mechanisms of both common and rare disease.

2026-04-29 | GSE303335 | GEO

CRYPTID-exon: streamlined detection of cryptic exons from RNA-seq data

Project description:Cryptic splicing has emerged as a pervasive feature of mammlian gene expression with recent studies discovering thousands of previously unannotated splice sites. Despite its prevalence, the functional consequences of this hidden layer of splicing remain largely unknown due to challenges in identifying the exact exonic regions introduced into mRNA transcripts. Here, we introduce a novel computational approach, CRYPTID-exon, that accurately predicts exon boundaries by modeling RNA-seq read coverage anchored on empirically derived splice sites. We use CRYPTID-exon to identify and characterize thousands of cryptic exons in nascent and mature RNA from human cells. Additionally, we demonstrate that CRYPTID-exon is well powered to identify exons that are sensitive to translation-mediated degradation processes. Finally, given the growing interest in leveraging cryptic exons to modulate gene expression levels, we use our approach to identify cryptic exons in disease-relevant genes. We see that targeting these exons with splice-switching antisense oligonucleotides (ASOs) can alter gene expression and splicing patterns of the parent genes. Our study provides a framework to systematically identify and characterize cryptic exons, which will enable downstream insights into their impact on mRNA stability and translation.

2026-02-26 | GSE312830 | GEO

Project description:KIF1A long read phasing

| PRJNA1244324 | ENA

Deep sequencing of the Caenorhabditis elegans transcriptome using RNA isolated from various developmental stages under various experimental conditions RW0001

Project description:The goal of this study, started as a part of the modENCODE project, is to detect and characterize previously unannotated transcripts of the C. elegans genome. This dataset has been imported from the Sequence Read Archive and curated by the WormBase and ArrayExpress teams.

2010-02-26 | E-MTAB-2683 | biostudies-arrayexpress

56708f83-d762-4a04-a3d4-ef17e96ef1fe - samples

Project description:We generated a large transcriptome atlas of human skeletal muscles by collecting biopsies from 6 different muscles to determine molecular signatures that may be distinct between leg muscles. The biopsies were collected from gracilis (GR), semitendinosus (ST), vastus lateralis (VL), vastus medialis (VM), rectus femoris (RF), and gastrocnemius lateralis (GL) muscles. We also investigated molecular differences within the muscle by including two biopsies from the middle and distal sides of the semitendinosus muscle (STM and STD, respectively). In total, 128 samples from 20 individuals (aged 25 Ã‚Â± 3.6 yr) were analyzed.

| EGAD00001008657 | EGA

A transcriptome atlas of human skeletal muscles

| EGAS00001005904 | EGA

Human introns contain conserved tissue-specific cryptic poison exons

Project description:Eukaryotic cells express a large number of transcripts from a single gene due to alternative splicing. Despite hundreds of thousands of splice isoforms being annotated in databases, it has been reported that the current exon catalogs remain incomplete. At the same time, introns of human protein-coding genes contain a large number of evolutionarily conserved elements with unknown function. Here, we explore the possibility that some of them represent cryptic exons that are expressed in rare conditions. We identified a group of cryptic exons that are similar to the annotated exons in terms of evolutionary conservation and RNA-seq read coverage in the GTEx dataset. Most of them were poison, i.e. generated an NMD isoform upon inclusion, and many showed signs of tissue-specific and cancer-specific expression and regulation. We performed RNA-seq in A549 cell line treated with cycloheximide to inactivate NMD, and confirmed using qPCR that seven of eight exons tested are, indeed, expressed. This study shows that introns of human protein-coding genes contain cryptic poison exons, which reside in conserved intronic regions and remain not fully annotated due to insufficient representation in RNA-seq libraries.

2024-11-20 | GSE270310 | GEO

Dynamic transcriptomes during neural differentiation of human embryonic stem cells

Project description:Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short long, and paired-end sequencing In order to examine the fundamental mechanisms governing neural differentiation, we analyzed the transcriptome changes that occur during the differentiation of human embryonic stem cells (hESCs) into the neural lineage. Undifferentiated hESCs as well as cells at three stages of early neural differentiation, N1 (early initiation), N2 (neural progenitor), and N3 (early glial-like) were analyzed using a combination of single read, paired-end read, and long read RNA sequencing. The results revealed enormous complexity in gene transcription and splicing dynamics during neural cell differentiation. We found previously unannotated transcripts and spliced isoforms specific for each stage of differentiation. Interestingly, splicing isoform diversity is highest in undifferentiated hESCs and decreases upon differentiation, a phenomenon we call “isoform specialization.” During neural differentiation, we observed differential expression of many types of genes including those involved in key signaling pathways, and a large number of extracellular receptors exhibit stage-specific regulation. These results provide a valuable resource for studying neural differentiation and reveal insights into the mechanisms underlying in vitro neural differentiation of hESCs, such as neural fate specification, NPC identity maintenance and the transition from a predominantly neuronal state into one with increased gliogenic potential

2010-03-04 | GSE20301 | GEO

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data