Project description:The intergenic space of plant genomes encodes many functionally important yet unexplored RNAs. The genomic loci encoding these RNAs are often considered "junk", DNA as they are frequently associated with repeat-rich regions of the genome. The latter makes the annotations of these loci and the assembly of the corresponding transcripts using short RNAseq reads particularly challenging. Here, using long-read Nanopore direct RNA sequencing, we aimed to identify these "junk" RNA molecules, including long non-coding RNAs (lncRNAs) and transposon-derived transcripts expressed during early stages (10 days post anthesis) of seed development of triticale (AABBRR, 2<i>n</i> = 6<i>x</i> = 42), an interspecific hybrid between wheat and rye. Altogether, we found 796 lncRNAs and 20 LTR retrotransposon-related transcripts (RTE-RNAs) expressed at this stage, with most of them being previously unannotated and located in the intergenic as well as intronic regions. Sequence analysis of the lncRNAs provide evidence for the frequent exonization of Class I (retrotransposons) and class II (DNA transposons) transposon sequences and suggest direct influence of "junk" DNA on the structure and origin of lncRNAs. We show that the expression patterns of lncRNAs and RTE-related transcripts have high stage specificity. In turn, almost half of the lncRNAs located in Genomes A and B have the highest expression levels at 10-30 days post anthesis in wheat. Detailed analysis of the protein-coding potential of the RTE-RNAs showed that 75% of them carry open reading frames (ORFs) for a diverse set of GAG proteins, the main component of virus-like particles of LTR retrotransposons. We further experimentally demonstrated that some RTE-RNAs originate from autonomous LTR retrotransposons with ongoing transposition activity during early stages of triticale seed development. Overall, our results provide a framework for further exploration of the newly discovered lncRNAs and RTE-RNAs in functional and genome-wide association studies in triticale and wheat. Our study also demonstrates that Nanopore direct RNA sequencing is an indispensable tool for the elucidation of lncRNA and retrotransposon transcripts.
Project description:MicroRNAs (miRNAs) are a class of short non-coding RNAs that function in RNA silencing and post-transcriptional gene regulation. However, direct characterization of miRNA is challenging due to its unique properties such as its low abundance, sequence similarities, and short length. Although urgently needed, single molecule sequencing of miRNA has never been demonstrated, to the best of our knowledge. Nanopore-induced phase-shift sequencing (NIPSS), which is a variant form of nanopore sequencing, could directly sequence any short analytes including miRNA. In practice, NIPSS clearly discriminates between different identities, isoforms, and epigenetic variants of model miRNA sequences. This work thus demonstrates direct sequencing of miRNA, which serves as a complement to existing miRNA sensing routines by the introduction of the single molecule resolution. Future engineering of this technique may assist miRNA-based early stage diagnosis or inspire novel cancer therapeutics.
Project description:BACKGROUND:Compared with second-generation sequencing technologies, third-generation single-molecule RNA sequencing has unprecedented advantages; the long reads it generates facilitate isoform-level transcript characterization. In particular, the Oxford Nanopore Technology sequencing platforms have become more popular in recent years owing to their relatively high affordability and portability compared with other third-generation sequencing technologies. To aid the development of analytical tools that leverage the power of this technology, simulated data provide a cost-effective solution with ground truth. However, a nanopore sequence simulator targeting transcriptomic data is not available yet. FINDINGS:We introduce Trans-NanoSim, a tool that simulates reads with technical and transcriptome-specific features learnt from nanopore RNA-sequncing data. We comprehensively benchmarked Trans-NanoSim on direct RNA and complementary DNA datasets describing human and mouse transcriptomes. Through comparison against other nanopore read simulators, we show the unique advantage and robustness of Trans-NanoSim in capturing the characteristics of nanopore complementary DNA and direct RNA reads. CONCLUSIONS:As a cost-effective alternative to sequencing real transcriptomes, Trans-NanoSim will facilitate the rapid development of analytical tools for nanopore RNA-sequencing data. Trans-NanoSim and its pre-trained models are freely accessible at https://github.com/bcgsc/NanoSim.
Project description:We report the genome-wide small RNA of soybean early maturation seed coat parenchyma compartment soybean early maturation seeds using Illumina high-throughput sequencing technology. Illumina sequencing of small RNA from early maturation seed coat parenchyma compartment and early-maturation stage whole seeds
Project description:We report the genome-wide transcriptome of soybean seeds across several stages of seed development and the entire life cycle using Illumina high-throughput sequencing technology. Specifically, we profiled whole seeds containing globular-stage, heart-stage, cotyledon-stage, and early maturation-stage embryos. We also profiled dry soybean seeds, and vegetative and reproductive tissues including leaves, roots, stems, seedlings, and floral buds. Illumina sequencing of transcripts from whole seeds at five stages of seed development (globular, heart, cotyledon, early-maturation, dry), and vegetative (leaves, roots, stems, seedlings) and reproductive (floral buds) tissues.
Project description:Seeds that contain large amounts of oil, starch, fibers and phenols are the most difficult tissues for RNA extraction. Currently, there are some reports of virus detection in seeds using commercial kits for RNA extraction. However, individual seeds were used, which may not be always suitable for analyses that deal with large amounts of seeds. Sangha  described a simple, quick and efficient protocol for RNA extraction and downstream applications in a group of seeds of jatropha (Jatropha curcas), mustard (Brassica sp.) and rice (Oryza sativa). We tested this protocol for soybean (Glycine max), maize (Zea mays), wheat (Triticum aestivum) and triticale (×Triticosecale) seeds and further reverse transcription PCR (RT-PCR)/quantitative real-time PCR (qPCR) in order to have a faster and more practical method for virus detection from seeds than the traditional scheme of seed planting and subsequent Elisa/RT-PCR from leaves. The essential points in the method are:•Some modifications in the protocol  were done in order to increase performance: Wheat and triticale seeds are incubated with water prior to maceration. An amount of 1.2 g of dry soybean seeds is used to maceration.•RT-PCR is used for detection of Wheat streak mosaic virus from wheat seeds and RT-qPCR for detection of Soybean mosaic virus from soybean seeds.•The method may be tested for other viruses, however, pre-validation will be needed.
Project description:Nanopore sequencing enables direct measurement of RNA molecules without conversion to cDNA, thus opening the gates to a new era for RNA biology. However, the lack of molecular barcoding of direct RNA nanopore sequencing data sets severely affects the applicability of this technology to biological samples, where RNA availability is often limited. Here, we provide the first experimental protocol and associated algorithm to barcode and demultiplex direct RNA nanopore sequencing data sets. Specifically, we present a novel and robust approach to accurately classify raw nanopore signal data by transforming current intensities into images or arrays of pixels, followed by classification using a deep learning algorithm. We demonstrate the power of this strategy by developing the first experimental protocol for barcoding and demultiplexing direct RNA sequencing libraries. Our method, DeePlexiCon, can classify 93% of reads with 95.1% accuracy or 60% of reads with 99.9% accuracy. The availability of an efficient and simple multiplexing strategy for native RNA sequencing will improve the cost-effectiveness of this technology, as well as facilitate the analysis of lower-input biological samples. Overall, our work exemplifies the power, simplicity, and robustness of signal-to-image conversion for nanopore data analysis using deep learning.
Project description:Arabidopsis thaliana transcriptomes have been extensively studied and characterized under different conditions. However, most of the current 'RNA-sequencing' technologies produce a relatively short read length and demand a reverse-transcription step, preventing effective characterization of transcriptome complexity. Here, we performed Direct RNA Sequencing (DRS) using the latest Oxford Nanopore Technology (ONT) with exceptional read length. We demonstrate that the complexity of the A. thaliana transcriptomes has been substantially under-estimated. The ONT direct RNA sequencing identified novel transcript isoforms at both the vegetative (14-day old seedlings, stage 1.04) and reproductive stages (stage 6.00-6.10) of development. Using in-house software called TrackCluster, we determined alternative transcription initiation (ATI), alternative polyadenylation (APA), alternative splicing (AS), and fusion transcripts. More than 38 500 novel transcript isoforms were identified, including six categories of fusion-transcripts that may result from differential RNA processing mechanisms. Aided by the Tombo algorithm, we found an enrichment of m5C modifications in the mobile mRNAs, consistent with a recent finding that m5C modification in mRNAs is crucial for their long-distance movement. In summary, ONT DRS offers an advantage in the identification and functional characterization of novel RNA isoforms and RNA base modifications, significantly improving annotation of the A. thaliana genome.
Project description:Facioscapulohumeral muscular dystrophy (FSHD) is an inherited muscle disease caused by misexpression of the DUX4 gene in skeletal muscle. DUX4 is a transcription factor, which is normally expressed in the cleavage-stage embryo and regulates gene expression involved in early embryonic development. Recent studies revealed that DUX4 also activates the transcription of repetitive elements such as endogenous retroviruses (ERVs), mammalian apparent long terminal repeat (LTR)-retrotransposons and pericentromeric satellite repeats (Human Satellite II). DUX4-bound ERV sequences also create alternative promoters for genes or long non-coding RNAs, producing fusion transcripts. To further understand transcriptional regulation by DUX4, we performed nanopore long-read direct RNA sequencing (dRNA-seq) of human muscle cells induced by DUX4, because long reads show whole isoforms with greater confidence. We successfully detected differential expression of known DUX4-induced genes and discovered 61 differentially expressed repeat loci, which are near DUX4-ChIP peaks. We also identified 247 gene-ERV fusion transcripts, of which 216 were not reported previously. In addition, long-read dRNA-seq clearly shows that RNA splicing is a common event in DUX4-activated ERV transcripts. Long-read analysis showed non-LTR transposons including Alu elements are also transcribed from LTRs. Our findings revealed further complexity of DUX4-induced ERV transcripts. This catalogue of DUX4-activated repetitive elements may provide useful information to elucidate the pathology of FSHD. Also, our results indicate that nanopore dRNA-seq has complementary strengths to conventional short-read complementary DNA sequencing.
Project description:In this study, based on Nanopore direct RNA-seq where native RNAs are sequenced directly as near full-length transcripts in the 3' to 5' direction, transcription units of the phytopathogen Dickeya dadantii 3937 were validated and transcriptional termination sites were determined. Briefly, D. dadantii cultures were grown in M63 medium supplemented with 0.2% glucose and 0.2% PGA, until the early exponential phase (A600nm = 0.2, condition 1), or the early stationary phase (A600nm = 1.8, condition 2). RNAs were extracted using a frozen acid-phenol method, as previously described (Hommais et al. 2008), and treated successively with Roche and Biolabs DNases. Two samples were prepared: 50 µg of RNAs from each condition were pulled into one sample (sample 1), whereas the other one contained 100 µg of RNAs from condition 2 (sample 2). Both samples were then supplied to Vertis Biotechnologie AG for Nanopore native RNA-seq: total RNA preparations were first examined by capillary electrophoresis. For sample 1, ribosomal RNA molecules were depleted using an in-house developed protocol (recovery rate = 84%), whereas no ribodepletion was performed for sample 2. 3' ends of RNA were then poly(A)-tailed using poly(A) polymerase, and the Direct RNA sequencing kit (SQK-RNA002) was used to prepare the library for 1D sequencing on the Oxford Nanopore sequencing device. The direct RNA libraries were sequenced on a MinION device (MIN-101B) using standard settings. Basecalling of the fast5 files was performed using Guppy (version 3.6.1) with the following settings: --flowcell FLO-MIN106 --kit SQK-RNA002 --cpu_threads_per_caller 12--compress_fastq --reverse_sequence true --trim_strategy rna. Reads smaller than 50 nt were removed. 466 393 and 556 850 reads were generated for sample 1 and 2, respectively.