Multiplexed Spliced-Leader Sequencing: A high-throughput, selective method for RNA-seq in Trypanosomatids.
ABSTRACT: High throughput sequencing techniques are poorly adapted for in vivo studies of parasites, which require prior in vitro culturing and purification. Trypanosomatids, a group of kinetoplastid protozoans, possess a distinctive feature in their transcriptional mechanism whereby a specific Spliced Leader (SL) sequence is added to the 5'end of each mRNA by trans-splicing. This allows to discriminate Trypansomatid RNA from mammalian RNA and forms the basis of our new multiplexed protocol for high-throughput, selective RNA-sequencing called SL-seq. We provided a proof-of-concept of SL-seq in Leishmania donovani, the main causative agent of visceral leishmaniasis in humans, and successfully applied the method to sequence Leishmania mRNA directly from infected macrophages and from highly diluted mixes with human RNA. mRNA profiles obtained with SL-seq corresponded largely to those obtained from conventional poly-A tail purification methods, indicating both enumerate the same mRNA pool. However, SL-seq offers additional advantages, including lower sequencing depth requirements, fast and simple library prep and high resolution splice site detection. SL-seq is therefore ideal for fast and massive parallel sequencing of parasite transcriptomes directly from host tissues. Since SLs are also present in Nematodes, Cnidaria and primitive chordates, this method could also have high potential for transcriptomics studies in other organisms.
Project description:Background:The spliceosomal transfer of a short spliced leader (SL) RNA to an independent pre-mRNA molecule is called SL trans-splicing and is widespread in the nematode Caenorhabditis elegans. While RNA-sequencing (RNA-seq) data contain information on such events, properly documented methods to extract them are lacking. Findings:To address this, we developed SL-quant, a fast and flexible pipeline that adapts to paired-end and single-end RNA-seq data and accurately quantifies SL trans-splicing events. It is designed to work downstream of read mapping and uses the reads left unmapped as primary input. Briefly, the SL sequences are identified with high specificity and are trimmed from the input reads, which are then remapped on the reference genome and quantified at the nucleotide position level (SL trans-splice sites) or at the gene level. Conclusions:SL-quant completes within 10 minutes on a basic desktop computer for typical C. elegans RNA-seq datasets and can be applied to other species as well. Validating the method, the SL trans-splice sites identified display the expected consensus sequence, and the results of the gene-level quantification are predictive of the gene position within operons. We also compared SL-quant to a recently published SL-containing read identification strategy that was found to be more sensitive but less specific than SL-quant. Both methods are implemented as a bash script available under the MIT license . Full instructions for its installation, usage, and adaptation to other organisms are provided.
Project description:In trypanosomatids, all mRNAs are processed via trans-splicing, although cis-splicing also occurs. In trans-splicing, a common small exon, the spliced leader (SL), which is derived from a small SL RNA species, is added to all mRNAs. Sm and Lsm proteins are core proteins that bind to U snRNAs and are essential for both these splicing processes. In this study, SmD3- and Lsm3-associated complexes were purified to homogeneity from Leishmania tarentolae. The purified complexes were analyzed by mass spectrometry, and 54 and 39 proteins were purified from SmD3 and Lsm complexes, respectively. Interestingly, among the proteins purified from Lsm3, no mRNA degradation factors were detected, as in Lsm complexes from other eukaryotes. The U1A complex was purified and mass spectrometry analysis identified, in addition to U1 small nuclear ribonucleoprotein (snRNP) proteins, additional co-purified proteins, including the polyadenylation factor CPSF73. Defects observed in cells silenced for U1 snRNP proteins suggest that the U1 snRNP functions exclusively in cis-splicing, although U1A also participates in polyadenylation and affects trans-splicing. The study characterized several trypanosome-specific nuclear factors involved in snRNP biogenesis, whose function was elucidated in Trypanosoma brucei. Conserved factors, such as PRP19, which functions at the heart of every cis-spliceosome, also affect SL RNA modification; GEMIN2, a protein associated with SMN (survival of motor neurons) and implicated in selective association of U snRNA with core Sm proteins in trypanosomes, is a master regulator of snRNP assembly. This study demonstrates the existence of trypanosomatid-specific splicing factors but also that conserved snRNP proteins possess trypanosome-specific functions.
Project description:BACKGROUND:In eco-epidemiological studies, Leishmania detection in vectors and reservoirs is frequently accomplished by high-throughput and sensitive molecular methods that target minicircle kinetoplast DNA (kDNA). A pan-Leishmania SYBR green quantitative PCR (qPCR) assay which detects the conserved spliced-leader RNA (SL RNA) sequence was developed recently. This study assessed the SL RNA assay performance combined with a crude extraction method for the detection of Leishmania in field-collected and laboratory-reared sand flies and in tissue samples from hyraxes as reservoir hosts. METHODS:Field-collected and laboratory-infected sand fly and hyrax extracts were subjected to three different qPCR approaches to assess the suitability of the SL RNA target for Leishmania detection. Nucleic acids of experimentally infected sand flies were isolated with a crude extraction buffer with ethanol precipitation and a commercial kit and tested for downstream DNA and RNA detection. Promastigotes were isolated from culture and sand fly midguts to assess whether there was difference in SL RNA and kDNA copy numbers. Naive sand flies were spiked with a serial dilution of promastigotes to make a standard curve. RESULTS:The qPCR targeting SL RNA performed well on infected sand fly samples, despite preservation and extraction under presumed unfavorable conditions for downstream RNA detection. Nucleic acid extraction by a crude extraction buffer combined with a precipitation step was highly compatible with downstream SL RNA and kDNA detection. Copy numbers of kDNA were found to be identical in culture-derived parasites and promastigotes isolated from sand fly midguts. SL RNA levels were slightly lower in sand fly promastigotes (?Cq 1.7). The theoretical limit of detection and quantification of the SL RNA qPCR respectively reached down to 10-3 and 10 parasite equivalents. SL RNA detection in stored hyrax samples was less efficient with some false-negative assay results, most likely due to the long-term tissue storage in absence of RNA stabilizing reagents. CONCLUSIONS:This study shows that a crude extraction method in combination with the SL RNA qPCR assay is suitable for the detection and quantification of Leishmania in sand flies. The assay is inexpensive, sensitive and pan-Leishmania specific, and accordingly an excellent assay for high-throughput screening in entomological research.
Project description:BACKGROUND: Although the genome sequence of the protozoan parasite Leishmania major was determined several years ago, the knowledge of its transcriptome was incomplete, both regarding the real number of genes and their primary structure. RESULTS: Here, we describe the first comprehensive transcriptome analysis of a parasite from the genus Leishmania. Using high-throughput RNA sequencing (RNA-seq), a total of 10285 transcripts were identified, of which 1884 were considered novel, as they did not match previously annotated genes. In addition, our data indicate that current annotations should be modified for many of the genes. The detailed analysis of the transcript processing sites revealed extensive heterogeneity in the spliced leader (SL) and polyadenylation addition sites. As a result, around 50% of the genes presented multiple transcripts differing in the length of the UTRs, sometimes in the order of hundreds of nucleotides. This transcript heterogeneity could provide an additional source for regulation as the different sizes of UTRs could modify RNA stability and/or influence the efficiency of RNA translation. In addition, for the first time for the Leishmania major promastigote stage, we are providing relative expression transcript levels. CONCLUSIONS: This study provides a concise view of the global transcriptome of the L. major promastigote stage, providing the basis for future comparative analysis with other development stages or other Leishmania species.
Project description:This review focuses on the spliced leader (SL) RNA and uridylic acid-rich small nuclear RNAs (U-snRNAs) involved in pre-mRNA processing in trypanosomatid protozoa, with particular emphasis on the mechanism of transcription and cap formation. The SL RNA plays a central role in mRNA biogenesis by providing the unique cap 4 structure to the 5' end of all mRNAs by trans-splicing. The trimethylguanosine capped U-snRNAs, on the other hand, represent an unusual example among eukaryotic snRNAs in that they are transcribed by RNA polymerase III. This implies the existence of a distinctive mechanism for capping enzyme selection by the transcriptional machinery. Furthermore, the transcription units of U-snRNA genes offer yet another example of the variety of choices that have been established during eukaryotic evolution, namely that an upstream tRNA gene or tRNA-like gene provides extragenic promoter elements for a downstream small RNA gene.
Project description:With the introduction of cost effective, rapid, and superior quality next generation sequencing techniques, gene expression analysis has become viable for labs conducting small projects as well as large-scale gene expression analysis experiments. However, the available protocols for construction of RNA-sequencing (RNA-Seq) libraries are expensive and/or difficult to scale for high-throughput applications. Also, most protocols require isolated total RNA as a starting point. We provide a cost-effective RNA-Seq library synthesis protocol that is fast, starts with tissue, and is high-throughput from tissue to synthesized library. We have also designed and report a set of 96 unique barcodes for library adapters that are amenable to high-throughput sequencing by a large combination of multiplexing strategies. Our developed protocol has more power to detect differentially expressed genes when compared to the standard Illumina protocol, probably owing to less technical variation amongst replicates. We also address the problem of gene-length biases affecting differential gene expression calls and demonstrate that such biases can be efficiently minimized during mRNA isolation for library preparation.
Project description:Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ?1 h of hands-on time.
Project description:Spliced leader trans-splicing (SLTS) plays a part in the maturation of pre-mRNAs in select species across multiple phyla but is particularly prevalent in Nematoda. The role of spliced leaders (SL) within the cell is unclear and an accurate assessment of SL occurrence within an organism is possible only after extensive sequencing data are available, which is not currently the case for many nematode species. SL discovery is further complicated by an absence of SL sequences from high-throughput sequencing results due to incomplete sequencing of the 5'-ends of transcripts during RNA-seq library preparation, known as 5'-bias. Existing datasets and novel methodology were used to identify both conserved SLs and unique hypervariable SLs within Heterodera glycines, the soybean cyst nematode. In H. glycines, twenty-one distinct SL sequences were found on 2,532 unique H. glycines transcripts. The SL sequences identified on the H. glycines transcripts demonstrated a high level of promiscuity, meaning that some transcripts produced as many as nine different individual SL-transcript combinations. Most uniquely, transcriptome analysis revealed that H. glycines is the first nematode to demonstrate a higher SL trans-splicing rate using a species-specific SL over well-conserved Caenorhabditis elegans SL-like sequences.
Project description:The kinetoplastid protozoan spliced leader (SL) RNA is the common substrate pre-mRNA utilized in all trans-splicing reactions. Here we show by fluorescence in situ hybridization that the SL RNA is present in the cytoplasm of Leishmania tarentolae and Trypanosoma brucei. Treatment with the karyopherin-specific inhibitor leptomycin B was toxic to T. brucei and eliminated the cytoplasmic SL RNA, suggesting that cytoplasmic SL RNA was dependent on the nuclear exporter exportin 1 (XPO1). Ectopic expression of xpo1 with a C506S mutation in T. brucei conferred resistance to leptomycin B. A reduction in SL RNA 3' extension removal and 5' methylation of nucleotide U(4) was observed in wild-type T. brucei treated with leptomycin B, suggesting that the cytoplasmic stage is necessary for SL RNA biogenesis. This study demonstrates spatial and mechanistic similarities between the posttranscriptional trafficking of the kinetoplastid protozoan SL RNA and the metazoan cis-spliceosomal small nuclear RNAs.
Project description:In the unicellular human parasites Trypanosoma brucei, Trypanosoma cruzi, and Leishmania spp., the spliced-leader (SL) RNA is a key molecule in gene expression donating its 5'-terminal region in SL addition trans splicing of nuclear pre-mRNA. While there is no evidence that this process exists in mammals, it is obligatory in mRNA maturation of trypanosomatid parasites. Hence, throughout their life cycle, these organisms crucially depend on high levels of SL RNA synthesis. As putative SL RNA gene transcription factors, a partially characterized small nuclear RNA-activating protein complex (SNAP(c)) and the TATA-binding protein related factor 4 (TRF4) have been identified thus far. Here, by tagging TRF4 with a novel epitope combination termed PTP, we tandem affinity purified from crude T. brucei extracts a stable and transcriptionally active complex of six proteins. Besides TRF4 these were identified as extremely divergent subunits of SNAP(c) and of transcription factor IIA (TFIIA). The latter finding was unexpected since genome databases of trypanosomatid parasites appeared to lack general class II transcription factors. As we demonstrate, the TRF4/SNAP(c)/TFIIA complex binds specifically to the SL RNA gene promoter upstream sequence element and is absolutely essential for SL RNA gene transcription in vitro.