Project description:Transcriptome assays are increasingly being performed by high-throughput RNA sequencing (RNA-seq). For organisms whose genomes have not been sequenced and annotated, transcriptomes must be assembled de novo from the RNA-seq data. Here, we present novel algorithms, specific to bacterial gene structures and transcriptomes, for analysis of bacterial RNA-seq data and de novo transcriptome assembly. The algorithms are implemented in an open source software system called Rockhopper 2. We find that Rockhopper 2 outperforms other de novo transcriptome assemblers and offers accurate and efficient analysis of bacterial RNA-seq data. Rockhopper 2 is available at http://cs.wellesley.edu/~btjaden/Rockhopper .
Project description:Inbred mice are a useful tool for studying the in vivo functions of platelets. Nonetheless, the mRNA signature of mouse platelets is not known. Here, we use paired-end next-generation RNA sequencing (RNA-seq) to characterize the polyadenylated transcriptomes of human and mouse platelets. We report that RNA-seq provides unprecedented resolution of mRNAs that are expressed across the entire human and mouse genomes. Transcript expression and abundance are often conserved between the 2 species. Several mRNAs, however, are differentially expressed in human and mouse platelets. Moreover, previously described functional disparities between mouse and human platelets are reflected in differences at the transcript level, including protease activated receptor-1, protease activated receptor-3, platelet activating factor receptor, and factor V. This suggests that RNA-seq is a useful tool for predicting differences in platelet function between mice and humans. Our next-generation sequencing analysis provides new insights into the human and murine platelet transcriptomes. The sequencing dataset will be useful in the design of mouse models of hemostasis and a catalyst for discovery of new functions of platelets. Access to the dataset is found in the "Introduction."
Project description:BACKGROUND: Plant gametophytes play central roles in sexual reproduction. A hallmark of the plant life cycle is that gene expression is required in the haploid gametophytes. Consequently, many mutant phenotypes are expressed in this phase. RESULTS: We perform a quantitative RNA-seq analysis of embryo sacs, comparator ovules with the embryo sacs removed, mature pollen, and seedlings to assist the identification of gametophyte functions in maize. Expression levels were determined for annotated genes in both gametophytes, and novel transcripts were identified from de novo assembly of RNA-seq reads. Transposon-related transcripts are present in high levels in both gametophytes, suggesting a connection between gamete production and transposon expression in maize not previously identified in any female gametophytes. Two classes of small signaling proteins and several transcription factor gene families are enriched in gametophyte transcriptomes. Expression patterns of maize genes with duplicates in subgenome 1 and subgenome 2 indicate that pollen-expressed genes in subgenome 2 are retained at a higher rate than subgenome 2 genes with other expression patterns. Analysis of available insertion mutant collections shows a statistically significant deficit in insertions in gametophyte-expressed genes. CONCLUSIONS: This analysis, the first RNA-seq study to compare both gametophytes in a monocot, identifies maize gametophyte functions, gametophyte expression of transposon-related sequences, and unannotated, novel transcripts. Reduced recovery of mutations in gametophyte-expressed genes is supporting evidence for their function in the gametophytes. Expression patterns of extant, duplicated maize genes reveals that selective pressures based on male gametophytic function have likely had a disproportionate effect on plant genomes.
Project description:<h4>Background</h4>High-throughput sequencing of cDNA libraries (RNA-Seq) has proven to be a highly effective approach for studying bacterial transcriptomes. A central challenge in designing RNA-Seq-based experiments is estimating a priori the number of reads per sample needed to detect and quantify thousands of individual transcripts with a large dynamic range of abundance.<h4>Results</h4>We have conducted a systematic examination of how changes in the number of RNA-Seq reads per sample influences both profiling of a single bacterial transcriptome and the comparison of gene expression among samples. Our findings suggest that the number of reads typically produced in a single lane of the Illumina HiSeq sequencer far exceeds the number needed to saturate the annotated transcriptomes of diverse bacteria growing in monoculture. Moreover, as sequencing depth increases, so too does the detection of cDNAs that likely correspond to spurious transcripts or genomic DNA contamination. Finally, even when dozens of barcoded individual cDNA libraries are sequenced in a single lane, the vast majority of transcripts in each sample can be detected and numerous genes differentially expressed between samples can be identified.<h4>Conclusions</h4>Our analysis provides a guide for the many researchers seeking to determine the appropriate sequencing depth for RNA-Seq-based studies of diverse bacterial species.
Project description:Online sequence repositories are teeming with RNA sequencing (RNA-Seq) data from a wide range of eukaryotes. Although most of these data sets contain large numbers of organelle-derived reads, researchers tend to ignore these data, focusing instead on the nuclear-derived transcripts. Consequently, GenBank contains massive amounts of organelle RNA-Seq data that are just waiting to be downloaded and analyzed. Recently, a team of scientists designed an open-source bioinformatics program called ChloroSeq, which systemically analyzes an organelle transcriptome using RNA-Seq. The ChloroSeq pipeline uses RNA-Seq alignment data to deliver detailed analyses of organelle transcriptomes, which can be fed into statistical software for further analysis and for generating graphical representations of the data. In addition to providing data on expression levels via coverage statistics, ChloroSeq can examine splicing efficiency and RNA editing profiles. Ultimately, ChloroSeq provides a well-needed avenue for researchers of all stripes to start exploring organelle transcription and could be a key step toward a more thorough understanding of organelle gene expression.
Project description:Although prokaryotic gene transcription has been studied over decades, many aspects of the process remain poorly understood. Particularly, recent studies have revealed that transcriptomes in many prokaryotes are far more complex than previously thought. Genes in an operon are often alternatively and dynamically transcribed under different conditions, and a large portion of genes and intergenic regions have antisense RNA (asRNA) and non-coding RNA (ncRNA) transcripts, respectively. Ironically, similar studies have not been conducted in the model bacterium E coli K12, thus it is unknown whether or not the bacterium possesses similar complex transcriptomes. Furthermore, although RNA-seq becomes the major method for analyzing the complexity of prokaryotic transcriptome, it is still a challenging task to accurately assemble full length transcripts using short RNA-seq reads.To fill these gaps, we have profiled the transcriptomes of E. coli K12 under different culture conditions and growth phases using a highly specific directional RNA-seq technique that can capture various types of transcripts in the bacterial cells, combined with a highly accurate and robust algorithm and tool TruHMM (http://bioinfolab.uncc.edu/TruHmm_package/) for assembling full length transcripts. We found that 46.9?~?63.4% of expressed operons were utilized in their putative alternative forms, 72.23?~?89.54% genes had putative asRNA transcripts and 51.37?~?72.74% intergenic regions had putative ncRNA transcripts under different culture conditions and growth phases.As has been demonstrated in many other prokaryotes, E. coli K12 also has a highly complex and dynamic transcriptomes under different culture conditions and growth phases. Such complex and dynamic transcriptomes might play important roles in the physiology of the bacterium. TruHMM is a highly accurate and robust algorithm for assembling full-length transcripts in prokaryotes using directional RNA-seq short reads.
Project description:To understand how the interaction between an intracellular bacterium and the host immune system contributes to outcome at the site of infection, we studied leprosy, a disease that forms a clinical spectrum, in which progressive infection by the intracellular bacterium Mycobacterium leprae is characterized by the production of type I IFNs and antibody production. Dual RNA-seq on patient lesions identifies two independent molecular measures of M. leprae, each of which correlates with distinct aspects of the host immune response. The fraction of bacterial transcripts, reflecting bacterial burden, correlates with a host type I IFN gene signature, known to inhibit antimicrobial responses. Second, the bacterial mRNA:rRNA ratio, reflecting bacterial viability, links bacterial heat shock proteins with the BAFF-BCMA host antibody response pathway. Our findings provide a platform for the interrogation of host and pathogen transcriptomes at the site of infection, allowing insight into mechanisms of inflammation in human disease.
Project description:The alignment of sequencing reads to a transcriptome is a common and important step in many RNA-seq analysis tasks. When aligning RNA-seq reads directly to a transcriptome (as is common in the de novo setting or when a trusted reference annotation is available), care must be taken to report the potentially large number of multi-mapping locations per read. This can pose a substantial computational burden for existing aligners, and can considerably slow downstream analysis.We introduce a novel concept, quasi-mapping, and an efficient algorithm implementing this approach for mapping sequencing reads to a transcriptome. By attempting only to report the potential loci of origin of a sequencing read, and not the base-to-base alignment by which it derives from the reference, RapMap-our tool implementing quasi-mapping-is capable of mapping sequencing reads to a target transcriptome substantially faster than existing alignment tools. The algorithm we use to implement quasi-mapping uses several efficient data structures and takes advantage of the special structure of shared sequence prevalent in transcriptomes to rapidly provide highly-accurate mapping information. We demonstrate how quasi-mapping can be successfully applied to the problems of transcript-level quantification from RNA-seq reads and the clustering of contigs from de novo assembled transcriptomes into biologically meaningful groups.RapMap is implemented in C ++11 and is available as open-source software, under GPL v3, at https://github.com/COMBINE-lab/RapMaprob.firstname.lastname@example.orgSupplementary data are available at Bioinformatics online.
Project description:The interaction of eukaryotic host and prokaryotic pathogen cells is linked to specific changes in the cellular proteome, and consequently to infection-related gene expression patterns of the involved cells. To simultaneously assess the transcriptomes of both organisms during their interaction we developed dual 3'Seq, a tag-based sequencing protocol that allows for exact quantification of differentially expressed transcripts in interacting pro- and eukaryotic cells without prior fixation or physical disruption of the interaction.Human epithelial cells were infected with Salmonella enterica Typhimurium as a model system for invasion of the intestinal epithelium, and the transcriptional response of the infected host cells together with the differential expression of invading and intracellular pathogen cells was determined by dual 3'Seq coupled with the next-generation sequencing-based transcriptome profiling technique deepSuperSAGE (deep Serial Analysis of Gene Expression). Annotation to reference transcriptomes comprising the operon structure of the employed S. enterica Typhimurium strain allowed for in silico separation of the interacting cells including quantification of polycistronic RNAs. Eighty-nine percent of the known loci are found to be transcribed in prokaryotic cells prior or subsequent to infection of the host, while 75% of all protein-coding loci are represented in the polyadenylated transcriptomes of human host cells.Dual 3'Seq was alternatively coupled to MACE (Massive Analysis of cDNA ends) to assess the advantages and drawbacks of a library preparation procedure that allows for sequencing of longer fragments. Additionally, the identified expression patterns of both organisms were validated by qRT-PCR using three independent biological replicates, which confirmed that RELB along with NFKB1 and NFKB2 are involved in the initial immune response of epithelial cells after infection with S. enterica Typhimurium.
Project description:Background:RNA sequencing (RNA-seq) analyses can benefit from performing a genome-guided and de novo assembly, in particular for species where the reference genome or the annotation is incomplete. However, tools for integrating an assembled transcriptome with reference annotation are lacking. Findings:Necklace is a software pipeline that runs genome-guided and de novo assembly and combines the resulting transcriptomes with reference genome annotations. Necklace constructs a compact but comprehensive superTranscriptome out of the assembled and reference data. Reads are subsequently aligned and counted in preparation for differential expression testing. Conclusions:Necklace allows a comprehensive transcriptome to be built from a combination of assembled and annotated transcripts, which results in a more comprehensive transcriptome for the majority of organisms. In addition RNA-seq data are mapped back to this newly created superTranscript reference to enable differential expression testing with standard methods.