The impact of RNA sequence library construction protocols on transcriptomic profiling of leukemia.
ABSTRACT: RNA sequencing (RNA-seq) has become an indispensable tool to identify disease associated transcriptional profiles and determine the molecular underpinnings of diseases. However, the broad adaptation of the methodology into the clinic is still hampered by inconsistent results from different RNA-seq protocols and involves further evaluation of its analytical reliability using patient samples. Here, we applied two commonly used RNA-seq library preparation protocols to samples from acute leukemia patients to understand how poly-A-tailed mRNA selection (PA) and ribo-depletion (RD) based RNA-seq library preparation protocols affect gene fusion detection, variant calling, and gene expression profiling.Overall, the protocols produced similar results with consistent outcomes. Nevertheless, the PA protocol was more efficient in quantifying expression of leukemia marker genes and showed better performance in the expression-based classification of leukemia. Independent qRT-PCR experiments verified that the PA protocol better represented total RNA compared to the RD protocol. In contrast, the RD protocol detected a higher number of non-coding RNA features and had better alignment efficiency. The RD protocol also recovered more known fusion-gene events, although variability was seen in fusion gene predictions.The overall findings provide a framework for the use of RNA-seq in a precision medicine setting with limited number of samples and suggest that selection of the library preparation protocol should be based on the objectives of the analysis.
Project description:Advantages of RNA-Seq over array based platforms are quantitative gene expression and discovery of expressed single nucleotide variants (eSNVs) and fusion transcripts from a single platform, but the sensitivity for each of these characteristics is unknown. We measured gene expression in a set of manually degraded RNAs, nine pairs of matched fresh-frozen, and FFPE RNA isolated from breast tumor with the hybridization based, NanoString nCounter, (226 gene panel) and with whole transcriptome RNA-Seq using RiboZeroGold ScriptSeq V2 library preparation kits. We performed correlation analyses of gene expression between samples and across platforms. We then specifically assessed whole transcriptome expression of lincRNA and discovery of eSNVs and fusion transcripts in the FFPE RNA-Seq data. For gene expression in the manually degraded samples, we observed Pearson correlation of >0.94 and >0.80 with NanoString and ScriptSeq protocols respectively. Gene expression data for matched fresh-frozen and FFPE samples yielded mean Pearson correlations of 0.874 and 0.783 for NanoString (226 genes) and ScriptSeq whole transcriptome protocols respectively. Specifically for lincRNAs, we observed superb Pearson correlation (0.988) between matched fresh-frozen and FFPE pairs. FFPE samples across NanoString and RNA-Seq platforms gave a mean Pearson correlation of 0.838. In FFPE libraries, we detected 53.4% of high confidence SNVs and 24% of high confidence fusion transcripts. Sensitivity of fusion transcript detection was not overcome by an increase in depth of sequencing up to 3-fold (increase from ~56 to ~159 million reads). Both NanoString and ScriptSeq RNA-Seq technologies yield reliable gene expression data for degraded and FFPE material. The high degree of correlation between NanoString and RNA-Seq platforms suggests discovery based whole transciptome studies from FFPE material will produce reliable expression data. The RiboZeroGold ScriptSeq protocol performed particularly well for lincRNA expression from FFPE libraries but detection of eSNV and fusion transcripts was less sensitive. We performed RNASeq on RNA from nine matched pairs of fresh-frozen and FFPE tissues from breast cancer patients. The goal was to test the RiboZeroGold ScriptSeq complete low input library preparation kit for degraded RNA samples.
Project description:A significant challenge to the effective application of RNA-seq to the complete transcript analysis of low quantity and/or degraded samples is the amplification of minimal input RNA to enable sequencing library construction. Several strategies have been commercialized in order to facilitate this goal. However, each strategy has its own specific protocols and methodology, and each may introduce unique bias and in some cases show specific preference for a collection of sequences. Our wider investigation of human spermatozoal RNAs was able to reveal their complexity despite being generally characterized by low quantity and high fragmentation. In this study, the following four commercially available RNA-seq amplification and library protocols for the preparation of low quantity/highly fragmented samples, SMARTer™ Ultra Low RNA (SU) for Illumina® Sequencing, SeqPlex RNA Amplification (SP), Ovation® RNA-Seq System V2 (OR), and Ovation® RNA-Seq Formalin Fixed Paraffin Embedded System (FFPES) were assessed using human sperm RNAs. Further investigation analyzed the effects on the end results of two different library preparation methods, Encore NGS Multiplex System I (Enc) and Ovation Ultralow Library Systems (UL), that appeared best suited to this type of RNA, along with other potential confounding factors such as FFPE preservation. Our results indicate that for each library preparation protocol, the differences in the initial amount of input RNA and choice of RNA purification step do not generate marked differences in terms of RNA profiling. However, substantial disparity is introduced by individual amplification methods prior to library construction. These significant differences may be caused by the different priming methods or amplification strategies used in each of the four different protocols examined. The observation of intra-sample variation introduced by the choice of protocol highlights the role that external factors play in planning and subsequent reliable interpretation of results of any RNA-seq experiment.
Project description:Experimental procedures for preparing RNA-seq and single-cell (sc) RNA-seq libraries are based on assumptions regarding their underlying enzymatic reactions. Here, we show that the fairness of these assumptions varies within libraries: coverage by sequencing reads along and between transcripts exhibits characteristic, protocol-dependent biases. To understand the mechanistic basis of this bias, we present an integrated modeling framework that infers the relationship between enzyme reactions during library preparation and the characteristic coverage patterns observed for different protocols. Analysis of new and existing (sc)RNA-seq data from six different library preparation protocols reveals that polymerase processivity is the mechanistic origin of coverage biases. We apply our framework to demonstrate that lowering incubation temperature increases processivity, yield, and (sc)RNA-seq sensitivity in all protocols. We also provide correction factors based on our model for increasing accuracy of transcript quantification in existing samples prepared at standard temperatures. In total, our findings improve our ability to accurately reflect in vivo transcript abundances in (sc)RNA-seq libraries.
Project description:Advantages of RNA-Seq over array based platforms are quantitative gene expression and discovery of expressed single nucleotide variants (eSNVs) and fusion transcripts from a single platform, but the sensitivity for each of these characteristics is unknown. We measured gene expression in a set of manually degraded RNAs, nine pairs of matched fresh-frozen, and FFPE RNA isolated from breast tumor with the hybridization based, NanoString nCounter (226 gene panel) and with whole transcriptome RNA-Seq using RiboZeroGold ScriptSeq V2 library preparation kits. We performed correlation analyses of gene expression between samples and across platforms. We then specifically assessed whole transcriptome expression of lincRNA and discovery of eSNVs and fusion transcripts in the FFPE RNA-Seq data. For gene expression in the manually degraded samples, we observed Pearson correlations of >0.94 and >0.80 with NanoString and ScriptSeq protocols, respectively. Gene expression data for matched fresh-frozen and FFPE samples yielded mean Pearson correlations of 0.874 and 0.783 for NanoString (226 genes) and ScriptSeq whole transcriptome protocols respectively, p<2x10(-16). Specifically for lincRNAs, we observed superb Pearson correlation (0.988) between matched fresh-frozen and FFPE pairs. FFPE samples across NanoString and RNA-Seq platforms gave a mean Pearson correlation of 0.838. In FFPE libraries, we detected 53.4% of high confidence SNVs and 24% of high confidence fusion transcripts. Sensitivity of fusion transcript detection was not overcome by an increase in depth of sequencing up to 3-fold (increase from ~56 to ~159 million reads). Both NanoString and ScriptSeq RNA-Seq technologies yield reliable gene expression data for degraded and FFPE material. The high degree of correlation between NanoString and RNA-Seq platforms suggests discovery based whole transcriptome studies from FFPE material will produce reliable expression data. The RiboZeroGold ScriptSeq protocol performed particularly well for lincRNA expression from FFPE libraries, but detection of eSNV and fusion transcripts was less sensitive.
Project description:With the introduction of cost effective, rapid, and superior quality next generation sequencing techniques, gene expression analysis has become viable for labs conducting small projects as well as large-scale gene expression analysis experiments. However, the available protocols for construction of RNA-sequencing (RNA-Seq) libraries are expensive and/or difficult to scale for high-throughput applications. Also, most protocols require isolated total RNA as a starting point. We provide a cost-effective RNA-Seq library synthesis protocol that is fast, starts with tissue, and is high-throughput from tissue to synthesized library. We have also designed and report a set of 96 unique barcodes for library adapters that are amenable to high-throughput sequencing by a large combination of multiplexing strategies. Our developed protocol has more power to detect differentially expressed genes when compared to the standard Illumina protocol, probably owing to less technical variation amongst replicates. We also address the problem of gene-length biases affecting differential gene expression calls and demonstrate that such biases can be efficiently minimized during mRNA isolation for library preparation.
Project description:RNA-seq has become the standard tool for collecting genome-wide expression data in diverse fields, from quantitative genetics and medical genomics to ecology and developmental biology. However, RNA-seq library preparation is still prohibitive for many laboratories. Recently, the field of single-cell transcriptomics has reduced costs and increased throughput by adopting early barcoding and pooling of individual samples -producing a single final library containing all samples. In contrast, RNA-seq protocols where each sample is processed individually are significantly more expensive and lower throughput than single-cell approaches. Yet, many projects depend on individual library generation to preserve important samples or for follow-up re-sequencing experiments. Improving on currently available RNA-seq methods we have developed TM3'seq, a 3'-enriched library preparation protocol that uses Tn5 transposase and preserves sample identity at each step. TM3'seq is designed for high-throughput processing of individual samples (96 samples in 6h, with only 3h hands-on time) at a fraction of the cost of commercial kits ($1.5 per sample). The protocol was tested in a range of human and Drosophila melanogaster RNA samples, recovering transcriptomes of the same quality and reliability than the commercial NEBNext kit. We expect that the cost- and time-efficient features of TM3'seq make large-scale RNA-seq experiments more permissive for the entire scientific community.
Project description:<h4>Objectives</h4>The sequencing by the PolyA selection is the most common approach for library preparation. With limited amount or degraded RNA, alternative protocols such as the NuGEN have been developed. However, it is not yet clear how the different library preparations affect the downstream analyses of the broad applications of RNA sequencing.<h4>Methods and materials</h4>Eight human mammary epithelial cell (HMEC) lines with high quality RNA were sequenced by Illumina's mRNA-Seq PolyA selection and NuGEN ENCORE library preparation. The following analyses and comparisons were conducted: 1) the numbers of genes captured by each protocol; 2) the impact of protocols on differentially expressed gene detection between biological replicates; 3) expressed single nucleotide variant (SNV) detection; 4) non-coding RNAs, particularly lincRNA detection; and 5) intragenic gene expression.<h4>Results</h4>Sequences from the NuGEN protocol had lower (75%) alignment rate than the PolyA (over 90%). The NuGEN protocol detected fewer genes (12-20% less) with a significant portion of reads mapped to non-coding regions. A large number of genes were differentially detected between the two protocols. About 17-20% of the differentially expressed genes between biological replicates were commonly detected between the two protocols. Significantly higher numbers of SNVs (5-6 times) were detected in the NuGEN samples, which were largely from intragenic and intergenic regions. The NuGEN captured fewer exons (25% less) and had higher base level coverage variance. While 6.3% of reads were mapped to intragenic regions in the PolyA samples, the percentages were much higher (20-25%) for the NuGEN samples. The NuGEN protocol did not detect more known non-coding RNAs such as lincRNAs, but targeted small and "novel" lincRNAs.<h4>Conclusion</h4>Different library preparations can have significant impacts on downstream analysis and interpretation of RNA-seq data. The NuGEN provides an alternative for limited or degraded RNA but it has limitations for some RNA-seq applications.
Project description:BACKGROUND:Circulating microRNAs (miRNAs) are attractive non-invasive biomarkers for a variety of conditions due to their stability and altered pathophysiological expression levels. Reliable detection of global expression profiles is required to maximise miRNA biomarker discovery. Although developments in small RNA-Seq technology have improved detection of plasma-based miRNAs, the low RNA content and sequencing bias introduced during library preparation remain challenging. In this study we compare commercially available RNA extraction methods using MagnaZol (Bioo Scientific) or miRNeasy (QIAGEN) and three library preparation methods - CleanTag (TriLink), NEXTflex (Bioo Scientific) and QIAseq (QIAGEN) - which aim to address one or both of these issues. RESULTS:Different RNA extractions and library preparation protocols result in differential detection of miRNAs. A greater proportion of reads mapped to miRNAs in libraries prepared with MagnaZol RNA than with miRNeasy RNA. Libraries prepared using QIAseq demonstrated the greatest miRNA diversity with many more very low abundance miRNAs detected (~ 2-3 fold more with < 10 reads), whilst CleanTag detected the fewest individual miRNAs and considerably over-represented miR-486-5p. Libraries prepared with QIAseq had the strongest correlation with RT-qPCR quantification. Analysis of unique molecular indices (UMIs) incorporated in the QIAseq protocol indicate that little PCR bias is introduced during small RNA library preparation. CONCLUSIONS:Small RNAs were consistently detected using all RNA extraction and library preparation protocols tested, but with some miRNAs at significantly different levels. Choice of the most suitable protocol should be informed by the relative importance of minimising the total sequencing required, detection of rare miRNAs or absolute quantification.
Project description:BACKGROUND:RNA-Seq is currently the most widely used tool to analyze whole-transcriptome profiles. There are numerous commercial kits available to facilitate preparing RNA-Seq libraries; however, it is still not clear how some of these kits perform in terms of: 1) ribosomal RNA removal; 2) read coverage or recovery of exonic vs. intronic sequences; 3) identification of differentially expressed genes (DEGs); and 4) detection of long non-coding RNA (lncRNA). In RNA-Seq analysis, understanding the strengths and limitations of commonly used RNA-Seq library preparation protocols is important, as this technology remains costly and time-consuming. RESULTS:In this study, we present a comprehensive evaluation of four RNA-Seq kits. We used three standard input protocols: Illumina TruSeq Stranded Total RNA and mRNA kits, a modified NuGEN Ovation v2 kit, and the TaKaRa SMARTer Ultra Low RNA Kit v3. Our evaluation of these kits included quality control measures such as overall reproducibility, 5' and 3' end-bias, and the identification of DEGs, lncRNAs, and alternatively spliced transcripts. Overall, we found that the two Illumina kits were most similar in terms of recovering DEGs, and the Illumina, modified NuGEN, and TaKaRa kits allowed identification of a similar set of DEGs. However, we also discovered that the Illumina, NuGEN and TaKaRa kits each enriched for different sets of genes. CONCLUSIONS:At the manufacturers' recommended input RNA levels, all the RNA-Seq library preparation protocols evaluated were suitable for distinguishing between experimental groups, and the TruSeq Stranded mRNA kit was universally applicable to studies focusing on protein-coding gene profiles. The TruSeq protocols tended to capture genes with higher expression and GC content, whereas the modified NuGEN protocol tended to capture longer genes. The SMARTer Ultra Low RNA Kit may be a good choice at the low RNA input level, although it was inferior to the TruSeq mRNA kit at standard input level in terms of rRNA removal, exonic mapping rates and recovered DEGs. Therefore, the choice of RNA-Seq library preparation kit can profoundly affect data outcomes. Consequently, it is a pivotal parameter to consider when designing an RNA-Seq experiment.
Project description:RNA-sequencing (RNA-seq) has become the standard method for unbiased analysis of gene expression but also provides access to more complex transcriptome features, including alternative RNA splicing, RNA editing, and even detection of fusion transcripts formed through chromosomal translocations. However, differences in library methods can adversely affect the ability to recover these different types of transcriptome data. For example, some methods have bias for one end of transcripts or rely on low-efficiency steps that limit the complexity of the resulting library, making detection of rare transcripts less likely. We tested several commonly used methods of RNA-seq library preparation and found vast differences in the detection of advanced transcriptome features, such as alternatively spliced isoforms and RNA editing sites. By comparing several different protocols available for the Ion Proton sequencer and by utilizing detailed bioinformatics analysis tools, we were able to develop an optimized random primer based RNA-seq technique that is reliable at uncovering rare transcript isoforms and RNA editing features, as well as fusion reads from oncogenic chromosome rearrangements. The combination of optimized libraries and rapid Ion Proton sequencing provides a powerful platform for the transcriptome analysis of research and clinical samples.