Project description:RNA sequencing (RNA-seq) has been a widely used high-throughput method to characterize transcriptomic dynamics spatiotemporally. However, typical RNA-seq data analysis pipelines depend on either a sequenced genome or reference transcripts. This constriction makes the use of RNA-seq for species lacking both of sequenced genomes and reference transcripts challenging. To solve this problem, we developed CRSP, an RNA-seq pipeline integrating multiple comparative species strategy but not depending on a specific sequenced genome or reference transcripts. Benchmarking suggests the CRSP tool can achieve high accuracy to quantify gene expression levels.
Project description:The use of reference DNA standards generated from cancer cell lines sequenced in the Cancer Genome Project to establish the sensitivity, specificity, accuracy and reproducibility of the WTSI GCLP sequencing pipeline
Project description:Transposable elements (TEs) are ubiquitous in genomes. Many of these TEs remain active and are an important fraction of the transcriptomes with potential effects on the host genomes. The functional impact of TEs is well known for model organisms, however, in transcriptomes analysis of non-model organisms, this information is ignored due to the difficulty in identifying and quantifying TEs. Here we develop ExplorATE, a pipeline that allows the identification and quantification of active TEs in non-model organisms that can be easily implemented within the R environment. Based on simulated data, we show that our pipeline accurately identifies and quantifies TEs, over-performing the commonly used tools in model organisms. We show the implementation of ExplorATE using real data for RNA-seq samples from different tissues (liver, ovary, and brain) of Liolaemus parthenos, the only parthenogenetic lizard known to date in the entire clade Iguanidae (pleurodonta). Our results show that a significant fraction of the transcriptome contains repeats, however many of these are co-expressed with genes. The implementation of our pipeline in real data allowed the identification of the most abundant transposon families in each tissue. The ERV2, CR1, and SINE3 families were particularly abundant in the liver. A test data set is provided in the ExplorATE package.