Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each technology has its distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than annotated ones. The TALON pipeline for technology-agnostic, long-read transcriptome discovery and quantification tracks both known and novel transcript models as well as expression levels across datasets for both simple studies and larger projects such as ENCODE that seek to decode transcriptional regulation in the human and mouse genomes to predict more accurate expression levels of genes and transcripts than possible with short-reads alone.
Project description:We performed the long-read RNA sequencing technique ONT-cappable-seq on RNA samples of T. thermophilus infected with phage P23-45 5 minutes post-infection. Using this approach, we obtained the primary transcriptome at the early infection stage and sequenced it in full-length. Based on this data, we were able to identify viral transcription start sites and termination sites and uncover distinct promoter motifs.
Project description:Zea mays is a leading model for elucidating transcriptional networks in plants, aided by increasingly refined studies of the transcriptome atlas across spatio-temporal, developmental, and environmental dimensions. Limiting this progress are uncertainties about the complete structure mRNA transcripts, particularly with respect to alternatively spliced isoforms. Although second-generation RNA-seq provides a quantitative assay for transcriptional and posttranscriptional events, the accurate reconstruction of full-length mRNA isoforms is challenging with short-read technologies. By producing much longer reads, third generation sequencing offers to solve the assembly problem, but can suffer from lower read accuracy and throughput. Here, we combine these complementary technologies to define and quantify high-confidence transcript isoforms in maize. Six tissues (root, pollen, embryo, endosperm, immature ear, and immature tassel) of the B73 inbred line were used for mRNA sequencing with the Illumina Hiseq2000 PE101 platform to comprehensively quantitate gene/isoform expression. In parallel, intact cDNAs from the same samples were sequenced using the PacBio RS II platform. The latter used six size fractionated libraries (<1kb, 1-2kb, 2-3kb, 3kb-5kb, 4-6kb,>5kb) to generate more than 2 million full length reads. Preliminary findings suggest that mechanisms of alternative splicing are differentially employed between different tissues. In addition, these data show promise to dramatically improve the status of maize genome annotation, with the detection of previously unidentified transcript isoforms, and uncovering previously unrecognized genes. This submission is data of Illumina Hiseq2000 PE101 reads.
Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each one displayed distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than with annotated ones. These data show that TALON is a technology-agnostic long-read transcriptome discovery and quantification pipeline capable of tracking both known and novel transcript models, as well as their expression levels, across datasets for both simple studies and in larger projects. These properties will enable TALON users to move beyond the limitations of short-read data to perform isoform discovery and quantification in a uniform manner on existing and future long-read platforms.
Project description:Using a highly efficient and strand-specific RNA-seq method combined with a highly accurate and robust algorithm and tool we developed, TruHMM for assembling full-length transcriptomes, to profile the transcriptome of E.coli K12 under different culture conditions and growth phases, we showed that the dynamic transcriptome structures appears to be culture condition and growth phases dependent.
Project description:Bacteriophages (phages) are widespread in Streptococcus pneumoniae, with most strains carrying phage genomes integrated into the chromosome. RNA sequencing was utilised to explore whether phage gene expression could be detected. The pneumococcal reference strain PMEN3 (Spain9V-3), which contained two full-length phages and one partial phage, was grown in broth culture and mitomycin C was added to facilitate phage induction. PMEN3 culture samples were taken at sequential time points and RNA was extracted and sequenced.
Project description:We report isoCirc, a long-read sequencing strategy coupled with an integrated computational pipeline to characterize full-length circular RNA (circRNA isoforms) using rolling circle amplification (RCA) followed by long-read sequencing. Applying isoCirc to 12 human tissues, we determined full-length structures and examined tissue specificities of circRNA isoforms in human transcriptomes.
Project description:RNA-Seq data were targeted for de novo assembly and reconstruction of full-length mouse transcripts. Sequencing of RNA taken from unstimulated DCs.