Project description:Arabidopsis thaliana transcriptomes have been extensively studied and characterized under different conditions. However, most of the current 'RNA-sequencing' technologies produce a relatively short read length and demand a reverse-transcription step, preventing effective characterization of transcriptome complexity. Here, we performed Direct RNA Sequencing (DRS) using the latest Oxford Nanopore Technology (ONT) with exceptional read length. We demonstrate that the complexity of the A. thaliana transcriptomes has been substantially under-estimated. The ONT direct RNA sequencing identified novel transcript isoforms at both the vegetative (14-day old seedlings, stage 1.04) and reproductive stages (stage 6.00-6.10) of development. Using in-house software called TrackCluster, we determined alternative transcription initiation (ATI), alternative polyadenylation (APA), alternative splicing (AS), and fusion transcripts. More than 38 500 novel transcript isoforms were identified, including six categories of fusion-transcripts that may result from differential RNA processing mechanisms. Aided by the Tombo algorithm, we found an enrichment of m5C modifications in the mobile mRNAs, consistent with a recent finding that m5C modification in mRNAs is crucial for their long-distance movement. In summary, ONT DRS offers an advantage in the identification and functional characterization of novel RNA isoforms and RNA base modifications, significantly improving annotation of the A. thaliana genome.
Project description:BACKGROUND: The current epidemic of obesity has caused a surge of interest in the study of adipose tissue formation. While major progress has been made in defining the molecular networks that control adipocyte terminal differentiation, the early steps of adipocyte development and the embryonic origin of this lineage remain largely unknown. RESULTS: Here we performed genome-wide analysis of gene expression during adipogenesis of mouse embryonic stem cells (ESCs). We then pursued comprehensive bioinformatic analyses, including de novo functional annotation and curation of the generated data within the context of biological pathways, to uncover novel biological functions associated with the early steps of adipocyte development. By combining in-depth gene regulation studies and in silico analysis of transcription factor binding site enrichment, we also provide insights into the transcriptional networks that might govern these early steps. CONCLUSIONS: This study supports several biological findings: firstly, adipocyte development in mouse ESCs is coupled to blood vessel morphogenesis and neural development, just as it is during mouse development. Secondly, the early steps of adipocyte formation involve major changes in signaling and transcriptional networks. A large proportion of the transcription factors that we uncovered in mouse ESCs are also expressed in the mouse embryonic mesenchyme and in adipose tissues, demonstrating the power of our approach to probe for genes associated with early developmental processes on a genome-wide scale. Finally, we reveal a plethora of novel candidate genes for adipocyte development and present a unique resource that can be further explored in functional assays.
Project description:BackgroundOrganellar transcriptomes are relatively under-studied systems, with data related to full-length transcripts and posttranscriptional modifications remaining sparse. Direct RNA sequencing presents the possibility of accessing a previously unavailable layer of information pertaining to transcriptomic data, as well as circumventing the biases introduced by second-generation RNA-seq platforms. Direct long-read ONT sequencing allows for the isoform analysis of full-length transcripts and the detection of posttranscriptional modifications. However, there are still relatively few projects employing this method specifically for studying organellar transcriptomes.ResultsCandida albicans is a promising model for investigating nucleo-mitochondrial interactions. This work comprises ONT sequencing of the Candida albicans mitochondrial transcriptome along with the development of a dedicated data analysis pipeline. This approach allowed for the detection of complete transcript isoforms and posttranslational RNA modifications, as well as an analysis of C. albicans deletion mutants in genes coding for the 5' and 3' mitochondrial RNA exonucleases CaPET127 and CaDSS1. It also enabled for corrections to previous studies in terms of 3' and 5' transcript ends. A number of intermediate splicing isoforms was also discovered, along with mature and unspliced transcripts and changes in their abundances resulting from disruption of both 5' and 3' exonucleolytic processing. Multiple putative posttranscriptional modification sites have also been detected.ConclusionsThis preliminary work demonstrates the suitability of direct RNA sequencing for studying yeast mitochondrial transcriptomes in general and provides new insights into the workings of the C. albicans mitochondrial transcriptome in particular. It also provides a general roadmap for analyzing mitochondrial transcriptomic data from other organisms.
Project description:The transcriptome profiles of the model plant Arabidopsis thaliana have been extensively studied and charcaterised under different developmental and physiological conditions. However, most of these “RNA-sequencing” datasets have been generated using the sequencing of reverse-transcribed cDNAs from mRNAs that have a relatively short read length. Here, we performed direct RNA sequencing using the latest Oxford Nanopore Technology (ONT) with unusual read length. We demonstrate that the complexity of the A. thaliana transcriptomes has been under-estimated. The ONT direct RNA sequencing technology identified transcript isoforms at a vegetative (14 day old seedlings, stage 1.04) and a reproductive stage (stage 6.00-6) when 10% of the flowers had opened. In-house software called TrackCluster was used to determine alternative transcription initiation (ATI), possible alternative polyadenylation (APA), poly(A) length, alternative splicing (AS), and fusion transcripts. Tombo software was used to detect RNA base modifications. More than 38,500 novel transcript isoforms were identified, including six categories of fusion-transcripts which may result from differential RNA processing mechanisms. Fusion-transcripts are prone to mis-assembly by sequencing with short reads using next-generation-sequencing (NGS). These new transcript isoforms provide important additions to the annotated Arabidopsis genome. The power of ONT in detecting RNA modifications was demonstrated by characterisation of the modifications between mobile mRNAs and total mRNAs. The mobile mRNAs were enriched in m5C modifications, which is consistent with a recent finding that m5C modification in mRNAs is crucial for their long-distance movement. In summary, ONT direct RNA sequencing greatly enhances the identification of novel RNA transcript isoforms and RNA base modifications.
Project description:Conventional bacterial genome annotation provides information about coding sequences but ignores untranslated regions and operons. However, untranslated regions contain important regulatory elements as well as targets for many regulatory factors, such as small RNAs. Operon maps are also essential for functional gene analysis. In the last decade, considerable progress has been made in the study of bacterial transcriptomes through transcriptome sequencing (RNA-seq). Given the compact nature of bacterial genomes, many challenges still cannot be resolved through short reads generated using classical RNA-seq because of fragmentation and loss of the full-length information. Direct RNA sequencing is a technology that sequences the native RNA directly without information loss or bias. Here, we employed direct RNA sequencing to annotate the Vibrio parahaemolyticus transcriptome with its full features, including transcription start sites (TSSs), transcription termination sites, and operon maps. A total of 4,103 TSSs were identified. In comparison to short-read sequencing, full-length information provided a deeper view of TSS classification, showing that most internal and antisense TSSs were actually a result of gene overlap. Sequencing the transcriptome of V. parahaemolyticus grown with bile allowed us to study the landscape of pathogenicity island Vp-PAI. Some genes in this region were reannotated, providing more accurate annotation to increase precision in their characterization. Quantitative detection of operons in V. parahaemolyticus showed high complexity in some operons, shedding light on a greater extent of regulation within the same operon. Our study using direct RNA sequencing provides a quantitative and high-resolution landscape of the V. parahaemolyticus transcriptome. IMPORTANCE Vibrio parahaemolyticus is a halophilic bacterium found in the marine environment. Outbreaks of gastroenteritis resulting from seafood poisoning by these pathogens have risen over the past 2 decades. Upon ingestion by humans-often through the consumption of raw or undercooked seafood-V. parahaemolyticus senses the host environment and expresses numerous genes, the products of which synergize to synthesize and secrete toxins that can cause acute gastroenteritis. To understand the regulation of such adaptive response, mRNA transcripts must be mapped accurately. However, due to the limitations of common sequencing methods, not all features of bacterial transcriptomes are always reported. We applied direct RNA sequencing to analyze the V. parahaemolyticus transcriptome. Mapping the full features of the transcriptome is anticipated to enhance our understanding of gene regulation in this bacterium and provides a data set for future work. Additionally, this study reveals a deeper view of a complicated transcriptome landscape, demonstrating the importance of applying such methods to other bacterial models.
Project description:Bacterial gene expression is a complex process involving extensive regulatory mechanisms. Along with growing interests in this field, Nanopore Direct RNA Sequencing (DRS) provides a promising platform for rapid and comprehensive characterization of bacterial RNA biology. However, the DRS of bacterial RNA is currently deficient in the yield of mRNA-mapping reads and has yet to be exploited for transcriptome-wide RNA modification mapping. Here, we showed that pre-processing of bacterial total RNA (size selection followed by ribosomal RNA depletion and polyadenylation) guaranteed high throughputs of sequencing data and considerably increased the amount of mRNA reads. This way, complex transcriptome architectures were reconstructed for Escherichia coli and Staphylococcus aureus and extended the boundaries of 225 known E. coli operons and 89 defined S. aureus operons. Utilizing unmodified in vitro-transcribed (IVT) RNA libraries as a negative control, several Nanopore-based computational tools globally detected putative modification sites in the E. coli and S. aureus transcriptomes. Combined with Next-Generation Sequencing-based N6-methyladenosine (m6A) detection methods, 75 high-confidence m6A candidates were identified in the E. coli protein-coding transcripts, while none were detected in S. aureus. Altogether, we demonstrated the potential of Nanopore DRS in systematic and convenient transcriptome and epitranscriptome analysis.
Project description:In eukaryotes, genes produce a variety of distinct RNA isoforms, each with potentially unique protein products, coding potential or regulatory signals such as poly(A) tail and nucleotide modifications. Assessing the kinetics of RNA isoform metabolism, such as transcription and decay rates, is essential for unraveling gene regulation. However, it is currently impeded by lack of methods that can differentiate between individual isoforms. Here, we introduce RNAkinet, a deep convolutional and recurrent neural network, to detect nascent RNA molecules following metabolic labeling with the nucleoside analog 5-ethynyl uridine and long-read, direct RNA sequencing with nanopores. RNAkinet processes electrical signals from nanopore sequencing directly and distinguishes nascent from pre-existing RNA molecules. Our results show that RNAkinet prediction performance generalizes in various cell types and organisms and can be used to quantify RNA isoform half-lives. RNAkinet is expected to enable the identification of the kinetic parameters of RNA isoforms and to facilitate studies of RNA metabolism and the regulatory elements that influence it.
Project description:Quantification of the dynamics of RNA metabolism is essential for understanding gene regulation in health and disease. Existing methods rely on metabolic labeling of nascent RNAs and physical separation or inference of labeling through PCR-generated mutations, followed by short-read sequencing. However, these methods are limited in their ability to identify transient decay intermediates or co-analyze RNA decay with cis-regulatory elements of RNA stability such as poly(A) tail length and modification status, at single molecule resolution. Here we use 5-ethynyl uridine (5EU) to label nascent RNA followed by direct RNA sequencing with nanopores. We developed RNAkinet, a deep convolutional and recurrent neural network that processes the electrical signal produced by nanopore sequencing to identify 5EU-labeled nascent RNA molecules. RNAkinet demonstrates generalizability to distinct cell types and organisms and reproducibly quantifies RNA kinetic parameters allowing the combined interrogation of RNA metabolism and cis-acting RNA regulatory elements.
Project description:Historically seen as a benign disease, it is now becoming clear that Plasmodium vivax can cause significant morbidity. Effective control strategies targeting P. vivax malaria is hindered by our limited understanding of vivax biology. Here we established the P. vivax transcriptome of the Intraerythrocytic Developmental Cycle (IDC) of two clinical isolates in high resolution by Illumina HiSeq platform. The detailed map of transcriptome generates new insights into regulatory mechanisms of individual genes and reveals their intimate relationship with specific biological functions. A transcriptional hotspot of vir genes observed on chromosome 2 suggests a potential active site modulating immune evasion of the Plasmodium parasite across patients. Compared to other eukaryotes, P. vivax genes tend to have unusually long 5' untranslated regions and also present multiple transcription start sites. In contrast, alternative splicing is rare in P. vivax but its association with the late schizont stage suggests some of its significance for gene function. The newly identified transcripts, including up to 179 vir like genes and 3018 noncoding RNAs suggest an important role of these gene/transcript classes in strain specific transcriptional regulation.
Project description:RNA sequencing has been widely used as an essential tool to probe gene expression. While standard practices have been established to analyze RNA-seq data, it is still challenging to interpret and remove artifactual signals. Several biological and technical factors such as sex, age, batches, and sequencing technology have been found to bias these estimates. Probabilistic estimation of expression residuals (PEER), which infers broad variance components in gene expression measurements, has been used to account for some systematic effects, but it has remained challenging to interpret these PEER factors. Here we show that transcriptome diversity-a simple metric based on Shannon entropy-explains a large portion of variability in gene expression and is the strongest known factor encoded in PEER factors. We then show that transcriptome diversity has significant associations with multiple technical and biological variables across diverse organisms and datasets. In sum, transcriptome diversity provides a simple explanation for a major source of variation in both gene expression estimates and PEER covariates.