Deep sequencing reveals a novel class of bidirectional promoters associated with neuronal genes.
ABSTRACT: BACKGROUND: Comprehensive annotation of transcripts expressed in a given tissue is a critical step towards the understanding of regulatory and functional pathways that shape the transcriptome. RESULTS: Here, we reconstructed a cumulative transcriptome of the human prefrontal cortex (PFC) based on approximately 300 million strand-specific RNA sequence (RNA-seq) reads collected at different stages of postnatal development. We find that more than 50% of reconstructed transcripts represent novel transcriptome elements, including 8,343 novel exons and exon extensions of annotated coding genes, 11,217 novel antisense transcripts and 29,541 novel intergenic transcripts or their fragments showing canonical features of long non-coding RNAs (lncRNAs). Our analysis further led to a surprising discovery of a novel class of bidirectional promoters (NBiPs) driving divergent transcription of mRNA and novel lncRNA pairs and displaying a distinct set of sequence and epigenetic features. In contrast to known bidirectional and unidirectional promoters, NBiPs are strongly associated with genes involved in neuronal functions and regulated by neuron-associated transcription factors. CONCLUSIONS: Taken together, our results demonstrate that large portions of the human transcriptome remain uncharacterized. The distinct sequence and epigenetic features of NBiPs, as well as their specific association with neuronal genes, further suggest existence of regulatory pathways specific to the human brain.
Project description:BACKGROUND: Divergent transcription is a wide-spread phenomenon in mammals. For instance, short bidirectional transcripts are a hallmark of active promoters, while longer transcripts can be detected antisense from active genes in conditions where the RNA degradation machinery is inhibited. Moreover, many described long non-coding RNAs (lncRNAs) are transcribed antisense from coding gene promoters. However, the general significance of divergent lncRNA/mRNA gene pair transcription is still poorly understood. Here, we used strand-specific RNA-seq with high sequencing depth to thoroughly identify antisense transcripts from coding gene promoters in primary mouse tissues. RESULTS: We found that a substantial fraction of coding-gene promoters sustain divergent transcription of long non-coding RNA (lncRNA)/mRNA gene pairs. Strikingly, upstream antisense transcription is significantly associated with genes related to transcriptional regulation and development. Their promoters share several characteristics with those of transcriptional developmental genes, including very large CpG islands, high degree of conservation and epigenetic regulation in ES cells. In-depth analysis revealed a unique GC skew profile at these promoter regions, while the associated coding genes were found to have large first exons, two genomic features that might enforce bidirectional transcription. Finally, genes associated with antisense transcription harbor specific H3K79me2 epigenetic marking and RNA polymerase II enrichment profiles linked to an intensified rate of early transcriptional elongation. CONCLUSIONS: We concluded that promoters of a class of transcription regulators are characterized by a specialized transcriptional control mechanism, which is directly coupled to relaxed bidirectional transcription.
Project description:The diversification of gene functions has been largely attributed to the process of gene duplication. Novel examples of genes originating from previously untranscribed regions have been recently described without regard to a unifying functional mechanism for their emergence. Here we propose a model mechanism that could generate a large number of lineage-specific novel transcripts in vertebrates through the activation of bidirectional transcription from unidirectional promoters. We examined this model in silico using human transcriptomic and genomic data and identified evidence consistent with the emergence of more than 1,000 primate-specific transcripts. These are transcripts with low coding potential and virtually no functional annotation. They initiate at less than 1 kb upstream of an oppositely transcribed conserved protein coding gene, in agreement with the generally accepted definition of bidirectional promoters. We found that the genomic regions upstream of ancestral promoters, where the novel transcripts in our dataset reside, are characterized by preferential accumulation of transposable elements. This enhances the sequence diversity of regions located upstream of ancestral promoters, further highlighting their evolutionary importance for the emergence of transcriptional novelties. By applying a newly developed test for positive selection to transposable element-derived fragments in our set of novel transcripts, we found evidence of adaptive evolution in the human lineage in nearly 3% of the novel transcripts in our dataset. These findings indicate that at least some novel transcripts could become functionally relevant, and thus highlight the evolutionary importance of promoters, through their capacity for bidirectional transcription, for the emergence of novel genes.
Project description:A "bidirectional gene pair" comprises two adjacent genes whose transcription start sites are neighboring and directed away from each other. The intervening regulatory region is called a "bidirectional promoter." These promoters are often associated with genes that function in DNA repair, with the potential to participate in the development of cancer. No connection between these gene pairs and cancer has been previously investigated. Using the database of spliced-expressed sequence tags (ESTs), we identified the most complete collection of human transcripts under the control of bidirectional promoters. A rigorous screen of the spliced EST data identified new bidirectional promoters, many of which functioned as alternative promoters or regulated novel transcripts. Additionally, we show a highly significant enrichment of bidirectional promoters in genes implicated in somatic cancer, including a substantial number of genes implicated in breast and ovarian cancers. The repeated use of this promoter structure in the human genome suggests it could regulate co-expression patterns among groups of genes. Using microarray expression data from 79 human tissues, we verify regulatory networks among genes controlled by bidirectional promoters. Subsets of these promoters contain similar combinations of transcription factor binding sites, including evolutionarily conserved ETS factor binding sites in ERBB2, FANCD2, and BRCA2. Interpreting the regulation of genes involved in co-expression networks, especially those involved in cancer, will be an important step toward defining molecular events that may contribute to disease.
Project description:Next-generation sequencing studies have revealed that a variety of transcripts are present in the prokaryotic transcriptome and a significant fraction of them are functional, being involved in various regulatory activities apart from coding for proteins. Identification of promoters associated with different transcripts is necessary for characterization of the transcriptome. Promoter regions have been shown to have unique structural features as compared with their flanking region, in organisms covering all domains of life. Here we report an in silico analysis of DNA sequence dependent structural properties like stability, bendability and curvature in the promoter region of six different prokaryotic transcriptomes. Using these structural features, we predicted promoters associated with different categories of transcripts (mRNA, internal, antisense and non-coding), which constitute the transcriptome. Promoter annotation using structural features is fairly accurate and reliable with about 50% of the primary promoters being characterized by all three structural properties while at least one property identifies 95%. We also studied the relative differences of these structural features in terms of gene expression and found that the features, viz. lower stability, lesser bendability and higher curvature are more prominent in the promoter regions which are associated with high gene expression as compared with low expression genes. Hence, promoters, which are associated with higher gene expression, get annotated well using DNA structural features as compared with those, which are linked to lower gene expression.
Project description:Machine learning approaches are emerging as a way to discriminate various classes of functional elements. Previous attempts to create Regulatory Potential (RP) scores to discriminate functional DNA from nonfunctional DNA included using Markov models trained to identify sequences from promoters and enhancers from ancestral repeats. We proposed that knowledge gleaned from those methods could be further refined using a multiple class predictor to separate classes of promoter elements from enhancers or nonfunctional DNA.We extended our previous work, which identified over 5,000 candidate bidirectional promoters in the human genome, to map the orthologous promoter regions in the mouse genome. Our algorithm measured the robustness of evidence provided by the spliced EST annotations and incorporated evidence from annotations of UCSC Known Genes and GenBank mRNA. In preparation for de novo prediction of this promoter type, we examined characteristic features of the dataset as a whole. For instance, bidirectional promoters score very highly among all functional elements for Regulatory Potential Scores. This result was unexpected due to the limited sequence conservation found in these noncoding regions. We demonstrate that bidirectional promoters can be classified apart from other genomic features including non-bidirectional promoters, i.e. those promoters having no nearby upstream genes. Furthermore bidirectional promoters consistently score at the level of very highly conserved functional elements in the genome- developmental enhancers. The high scores are due to sequence-based characteristics within the promoters, not the surrounding exons. These results indicate that high-scoring RP regions can be deconvoluted into various functional classes of genomic elements. Using a multiple class predictor we are able to discriminate bidirectional promoters from enhancers, non-bidirectional promoters, and non-promoter regions on the basis of RP scores and CpG islands.We examine orthology at bidirectional promoters, use discriminatory machine learning approaches to differentiate multiple types of promoters from other functional and nonfunctional features in the genome and begin the process of deconvoluting classes of functional regions that score well with RP scores. These types of approaches precede supervised learning techniques to discover unannotated promoter regions.
Project description:<h4>Background</h4>Many mammalian genes are arranged in a bidirectional manner, sharing a common promoter and regulatory elements. This is especially true for promoters containing a CpG island, usually unmethylated and associated with an 'open' or accessible chromatin structure. In evolutionary terms, a primary function of genomic methylation is postulated to entail protection of the host genome from the disruption associated with activity of parasitic or transposable elements. These are usually epigenetically silenced following insertion into mammalian genomes, becoming sequence degenerate over time. Despite this, it is clear that many transposable element-derived DNAs have evaded host-mediated epigenetic silencing to remain expressed (domesticated) in mammalian genomes, several of which have demonstrated essential roles during mammalian development.<h4>Results</h4>The current study provides evidence that many CpG island-associated promoters associated with single genes exhibit inherent bidirectionality, facilitating "hijack" by transposable elements to create novel antisense 'head-to-head' bidirectional gene pairs in the genome that facilitates escape from host-mediated epigenetic silencing. This is often associated with an increase in CpG island length and transcriptional activity in the antisense direction. From a list of over 60 predicted protein-coding genes derived from transposable elements in the human genome and 40 in the mouse, we have found that a significant proportion are orientated in a bidirectional manner with CpG associated regulatory regions.<h4>Conclusion</h4>These data strongly suggest that the selective force that shields endogenous CpG-containing promoter from epigenetic silencing can extend to exogenous foreign DNA elements inserted in close proximity in the antisense orientation, with resulting transcription and maintenance of sequence integrity of such elements in the host genome. Over time, this may result in "domestication" of such elements to provide novel cellular and developmental functions.
Project description:Transcriptional interference denotes negative cis effects between promoters. Here, we show that promoters can also interact positively. Bidirectional RNA polymerase II (Pol II) elongation over the silent human endogenous retrovirus (HERV)-K 18 promoter (representative of 2.5 x 10(3) similar promoters genomewide) activates transcription. In tandem constructs, an upstream promoter activates HERV-K 18 transcription. This is abolished by inversion of the upstream promoter, or by insertion of a poly(A) signal between the promoters; transcription is restored by poly(A) signal mutants. TATA-box mutants in the upstream promoter reduce HERV-K 18 transcription. Experiments with the same promoters in a convergent orientation produce similar effects. A small promoter deletion partially restores HERV-K 18 activity, consistent with activation resulting from repressor repulsion by the elongating Pol II. Transcriptional elongation over this class of intragenic promoters will generate co-regulated sense-antisense transcripts, or, alternatively initiating transcripts, thus expanding the diversity and complexity of the human transcriptome.
Project description:BACKGROUND:In contrast to unidirectional promoters wherein antisense transcription results in short transcripts which are rapidly degraded, bidirectional promoters produce mature transcripts in both sense and antisense orientation. To understand the molecular mechanism of how productive bidirectional transcription is regulated, we focused on delineating the chromatin signature of bidirectional promoters. RESULTS:We report generation and utility of a reporter system that enables simultaneous scoring of transcriptional activity in opposite directions. Testing of putative bidirectional promoters in this system demonstrates no measurable bias towards any one direction of transcription. We analyzed the NUP26L-PIH1D3 bidirectional gene pair during Retinoic acid mediated differentiation of embryonic carcinoma cells. In their native context, we observed that the chromatin landscape at and around the transcription regulatory region between the pair of bidirectional genes is modulated in concordance with transcriptional activity of each gene in the pair. We then extended this analysis to 974 bidirectional gene pairs in two different cell lines, H1 human embryonic stem cells and CD4 positive T cells using publicly available ChIP-Seq and RNA-Seq data. Bidirectional gene pairs were classified based on the intergenic distance separating the two TSS of the transcripts analyzed as well as the relative expression of each transcript in a bidirectional gene pair. We report that for the entire range of intergenic distance separating bidirectional genes, the expression profile of such genes (symmetric or asymmetric) matches the histone modification profile of marks associated with active transcription initiation and elongation. CONCLUSIONS:We demonstrate unique distribution of histone modification marks that correlate robustly with the transcription status of genes regulated by bidirectional promoters. These findings strongly imply that occurrence of these marks might signal the transcription machinery to drive maturation of antisense transcription from the bidirectional promoters.
Project description:We analyzed the transcriptome of Escherichia coli K-12 by strand-specific RNA sequencing at single-nucleotide resolution during steady-state (logarithmic-phase) growth and upon entry into stationary phase in glucose minimal medium. To generate high-resolution transcriptome maps, we developed an organizational schema which showed that in practice only three features are required to define operon architecture: the promoter, terminator, and deep RNA sequence read coverage. We precisely annotated 2,122 promoters and 1,774 terminators, defining 1,510 operons with an average of 1.98 genes per operon. Our analyses revealed an unprecedented view of E. coli operon architecture. A large proportion (36%) of operons are complex with internal promoters or terminators that generate multiple transcription units. For 43% of operons, we observed differential expression of polycistronic genes, despite being in the same operons, indicating that E. coli operon architecture allows fine-tuning of gene expression. We found that 276 of 370 convergent operons terminate inefficiently, generating complementary 3' transcript ends which overlap on average by 286 nucleotides, and 136 of 388 divergent operons have promoters arranged such that their 5' ends overlap on average by 168 nucleotides. We found 89 antisense transcripts of 397-nucleotide average length, 7 unannotated transcripts within intergenic regions, and 18 sense transcripts that completely overlap operons on the opposite strand. Of 519 overlapping transcripts, 75% correspond to sequences that are highly conserved in E. coli (>50 genomes). Our data extend recent studies showing unexpected transcriptome complexity in several bacteria and suggest that antisense RNA regulation is widespread. Importance: We precisely mapped the 5' and 3' ends of RNA transcripts across the E. coli K-12 genome by using a single-nucleotide analytical approach. Our resulting high-resolution transcriptome maps show that ca. one-third of E. coli operons are complex, with internal promoters and terminators generating multiple transcription units and allowing differential gene expression within these operons. We discovered extensive antisense transcription that results from more than 500 operons, which fully overlap or extensively overlap adjacent divergent or convergent operons. The genomic regions corresponding to these antisense transcripts are highly conserved in E. coli (including Shigella species), although it remains to be proven whether or not they are functional. Our observations of features unearthed by single-nucleotide transcriptome mapping suggest that deeper layers of transcriptional regulation in bacteria are likely to be revealed in the future.
Project description:Purpose:Crystallin gene expression during lens fiber cell differentiation is tightly spatially and temporally regulated. A significant fraction of mammalian genes is transcribed from adjacent promoters in opposite directions ("bidirectional" promoters). It is not known whether two proximal genes located on the same allele are simultaneously transcribed. Methods:Mouse lens transcriptome was analyzed for paired genes whose transcriptional start sites are separated by less than 5 kbp to identify coexpressed bidirectional promoter gene pairs. To probe these transcriptional mechanisms, nascent transcription of Cryba4, Crybb1, and Crybb3 genes from gene-rich part of chromosome 5 was visualized by RNA fluorescent in situ hybridizations (RNA FISH) in individual lens fiber cell nuclei. Results:Genome-wide lens transcriptome analysis by RNA-seq revealed that the Cryba4-Crybb1 pair has the highest Pearson correlation coefficient between their steady-state mRNA levels. Analysis of Cryba4 and Crybb1 nascent transcription revealed frequent simultaneous expression of both genes from the same allele. Nascent Crybb3 transcript visualization in "early" but not "late" differentiating lens fibers show nuclear accumulation of the spliced Crybb3 transcripts that was not affected in abnormal lens fiber cell nuclei depleted of chromatin remodeling enzyme Snf2h (Smarca5). Conclusions:The current study shows for the first time that two highly expressed lens crystallin genes, Cryba4 and Crybb1, can be simultaneously transcribed from adjacent bidirectional promoters and do not show nuclear accumulation. In contrast, spliced Crybb3 mRNAs transiently accumulate in early lens fiber cell nuclei. The gene pairs coexpressed during lens development showed significant enrichment in human "cataract" phenotype.