Transcription start site associated RNAs (TSSaRNAs) are ubiquitous in all domains of life.
ABSTRACT: A plethora of non-coding RNAs has been discovered using high-resolution transcriptomics tools, indicating that transcriptional and post-transcriptional regulation is much more complex than previously appreciated. Small RNAs associated with transcription start sites of annotated coding regions (TSSaRNAs) are pervasive in both eukaryotes and bacteria. Here, we provide evidence for existence of TSSaRNAs in several archaeal transcriptomes including: Halobacterium salinarum, Pyrococcus furiosus, Methanococcus maripaludis, and Sulfolobus solfataricus. We validated TSSaRNAs from the model archaeon Halobacterium salinarum NRC-1 by deep sequencing two independent small-RNA enriched (RNA-seq) and a primary-transcript enriched (dRNA-seq) strand-specific libraries. We identified 652 transcripts, of which 179 were shown to be primary transcripts (?7% of the annotated genome). Distinct growth-associated expression patterns between TSSaRNAs and their cognate genes were observed, indicating a possible role in environmental responses that may result from RNA polymerase with varying pausing rhythms. This work shows that TSSaRNAs are ubiquitous across all domains of life.
Project description:Prokaryotic genomes show a high level of information compaction often with different molecules transcribed from the same locus. Although antisense RNAs have been relatively well studied, RNAs in the same strand, internal RNAs (intraRNAs), are still poorly understood. The question of how common is the translation of overlapping reading frames remains open. We address this question in the model archaeon Halobacterium salinarum. In the present work we used differential RNA-seq (dRNA-seq) in H. salinarum NRC-1 to locate intraRNA signals in subsets of internal transcription start sites (iTSS) and establish the open reading frames associated to them (intraORFs). Using C-terminally flagged proteins, we experimentally observed isoforms accurately predicted by intraRNA translation for kef1, acs3 and orc4 genes. We also recovered from the literature and mass spectrometry databases several instances of protein isoforms consistent with intraRNA translation such as the gas vesicle protein gene gvpC1. We found evidence for intraRNAs in horizontally transferred genes such as the chaperone dnaK and the aerobic respiration related cydA in both H. salinarum and Escherichia coli. Also, intraRNA translation evidence in H. salinarum, E. coli and yeast of a universal elongation factor (aEF-2, fusA and eEF-2) suggests that this is an ancient phenomenon present in all domains of life.
Project description:Antisense RNAs (asRNAs) are present in diverse organisms and play important roles in gene regulation. In this work, we mapped the primary antisense transcriptome in the halophilic archaeon Halobacterium salinarum NRC-1. By reanalyzing publicly available data, we mapped antisense transcription start sites (aTSSs) and inferred the probable 3' ends of these transcripts. We analyzed the resulting asRNAs according to the size, location, function of genes on the opposite strand, expression levels and conservation. We show that at least 21% of the genes contain asRNAs in H. salinarum. Most of these asRNAs are expressed at low levels. They are located antisense to genes related to distinctive characteristics of H. salinarum, such as bacteriorhodopsin, gas vesicles, transposases and other important biological processes such as translation. We provide evidence to support asRNAs in type II toxin?antitoxin systems in archaea. We also analyzed public Ribosome profiling (Ribo-seq) data and found that ~10% of the asRNAs are ribosome-associated non-coding RNAs (rancRNAs), with asRNAs from transposases overrepresented. Using a comparative transcriptomics approach, we found that ~19% of the asRNAs annotated in H. salinarum belong to genes with an ortholog in Haloferax volcanii, in which an aTSS could be identified with positional equivalence. This shows that most asRNAs are not conserved between these halophilic archaea.
Project description:The existence of sense overlapping transcripts that share regulatory and coding information in the same genomic sequence shows an additional level of prokaryotic gene expression complexity. Here we report the discovery of ncRNAs associated with IS1341-type transposase (tnpB) genes, at the 3'-end of such elements, with examples in archaea and bacteria. Focusing on the model haloarchaeon Halobacterium salinarum NRC-1, we show the existence of sense overlapping transcripts (sotRNAs) for all its IS1341-type transposases. Publicly available transcriptome compendium show condition-dependent differential regulation between sotRNAs and their cognate genes. These sotRNAs allowed us to find a UUCA tetraloop motif that is present in other archaea (ncRNA family HgcC) and in a H. salinarum intergenic ncRNA derived from a palindrome associated transposable elements (PATE). Overexpression of one sotRNA and the PATE-derived RNA harboring the tetraloop motif improved H. salinarum growth, indicating that these ncRNAs are functional.
Project description:This SuperSeries is composed of the following subset Series: GSE12923: Halobacterium salinarum NRC-1 growth curve, tiling arrays. GSE12977: Halobacterium salinarum NRC-1 growth curve GSE13108: Halobacterium salinarum NRC-1 conditional ChIP-chip for transcription initiation factor IIB 4 (TFBd) GSE7045: ChIP-Chip of General Transcription factors in Halobacterium NRC-1 GSE15786: Halobacterium sp. NRC-1 ChIP-chip for TFBa, TFBd and TFBf, high resolution array GSE15788: Halobacterium salinarum NRC-1 total RNA hybridization of TFBd overexpression versus Reference sample Despite knowledge of complex prokaryotic transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have played a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of ~64% of all genes including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein-DNA interaction datasets revealed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3' ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes - events usually considered spurious or non-functional. With experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements Refer to individual Series
Project description:Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of approximately 64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein-DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3' ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes-events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements.
Project description:Haloferax volcanii is a well-established model species for haloarchaea. Small scale RNomics and bioinformatics predictions were used to identify small non-coding RNAs (sRNAs), and deletion mutants revealed that sRNAs have important regulatory functions. A recent dRNA-Seq study was used to characterize the primary transcriptome. Unexpectedly, it was revealed that, under optimal conditions, H. volcanii contains more non-coding sRNAs than protein-encoding mRNAs. However, the dRNA-Seq approach did not contain any length information. Therefore, a mixed RNA-Seq approach was used to determine transcript length and to identify additional transcripts, which are not present under optimal conditions. In total, 50 million paired end reads of 150 nt length were obtained. 1861 protein-coding RNAs (cdRNAs) were detected, which encoded 3092 proteins. This nearly doubled the coverage of cdRNAs, compared to the previous dRNA-Seq study. About 2/3 of the cdRNAs were monocistronic, and 1/3 covered more than one gene. In addition, 1635 non-coding sRNAs were identified. The highest fraction of non-coding RNAs were cis antisense RNAs (asRNAs). Analysis of the length distribution revealed that sRNAs have a median length of about 150 nt. Based on the RNA-Seq and dRNA-Seq results, genes were chosen to exemplify characteristics of the H. volcanii transcriptome by Northern blot analyses, e.g. 1) the transcript patterns of gene clusters can be straightforward, but also very complex, 2) many transcripts differ in expression level under the four analyzed conditions, 3) some genes are transcribed into RNA isoforms of different length, which can be differentially regulated, 4) transcripts with very long 5'-UTRs and with very long 3'-UTRs exist, and 5) about 30% of all cdRNAs have overlapping 3'-ends, which indicates, together with the asRNAs, that H. volcanii makes ample use of sense-antisense interactions. Taken together, this RNA-Seq study, together with a previous dRNA-Seq study, enabled an unprecedented view on the H. volcanii transcriptome.
Project description:Deciphering the structure of gene regulatory networks across the tree of life remains one of the major challenges in postgenomic biology. We present a novel ChIP-seq workflow for the archaea using the model organism Halobacterium salinarum sp. NRC-1 and demonstrate its application for mapping the genome-wide binding sites of natively expressed transcription factors. This end-to-end pipeline is the first protocol for ChIP-seq in archaea, with methods and tools for each stage from gene tagging to data analysis and biological discovery. Genome-wide binding sites for transcription factors with many binding sites (TfbD) are identified with sensitivity, while retaining specificity in the identification the smaller regulons (bacteriorhodopsin-activator protein). Chromosomal tagging of target proteins with a compact epitope facilitates a standardized and cost-effective workflow that is compatible with high-throughput immunoprecipitation of natively expressed transcription factors. The Pique package, an open-source bioinformatics method, is presented for identification of binding events. Relative to ChIP-Chip and qPCR, this workflow offers a robust catalog of protein-DNA binding events with improved spatial resolution and significantly decreased cost. While this study focuses on the application of ChIP-seq in H. salinarum sp. NRC-1, our workflow can also be adapted for use in other archaea and bacteria with basic genetic tools.
Project description:While the model organism Escherichia coli has been the subject of intense study for decades, the full complement of its RNAs is only now being examined. Here we describe a survey of the E. coli transcriptome carried out using a differential RNA sequencing (dRNA-seq) approach, which can distinguish between primary and processed transcripts, and an automated prediction algorithm for transcriptional start sites (TSS). With the criterion of expression under at least one of three growth conditions examined, we predicted 14,868 TSS candidates, including 5,574 internal to annotated genes (iTSS) and 5,495 TSS corresponding to potential antisense RNAs (asRNAs). We examined expression of 14 candidate asRNAs by Northern analysis using RNA from wild-type E. coli and from strains defective for RNases III and E, two RNases reported to be involved in asRNA processing. Interestingly, nine asRNAs detected as distinct bands by Northern analysis were differentially affected by the rnc and rne mutations. We also compared our asRNA candidates with previously published asRNA annotations from RNA-seq data and discuss the challenges associated with these cross-comparisons. Our global transcriptional start site map represents a valuable resource for identification of transcription start sites, promoters, and novel transcripts in E. coli and is easily accessible, together with the cDNA coverage plots, in an online genome browser.