Rapid turnover of long noncoding RNAs and the evolution of gene expression.
ABSTRACT: A large proportion of functional sequence within mammalian genomes falls outside protein-coding exons and can be transcribed into long RNAs. However, the roles in mammalian biology of long noncoding RNA (lncRNA) are not well understood. Few lncRNAs have experimentally determined roles, with some of these being lineage-specific. Determining the extent by which transcription of lncRNA loci is retained or lost across multiple evolutionary lineages is essential if we are to understand their contribution to mammalian biology and to lineage-specific traits. Here, we experimentally investigated the conservation of lncRNA expression among closely related rodent species, allowing the evolution of DNA sequence to be uncoupled from evolution of transcript expression. We generated total RNA (RNAseq) and H3K4me3-bound (ChIPseq) DNA data, and combined both to construct catalogues of transcripts expressed in the adult liver of Mus musculus domesticus (C57BL/6J), Mus musculus castaneus, and Rattus norvegicus. We estimated the rate of transcriptional turnover of lncRNAs and investigated the effects of their lineage-specific birth or death. LncRNA transcription showed considerably greater gain and loss during rodent evolution, compared with protein-coding genes. Nucleotide substitution rates were found to mirror the in vivo transcriptional conservation of intergenic lncRNAs between rodents: only the sequences of noncoding loci with conserved transcription were constrained. Finally, we found that lineage-specific intergenic lncRNAs appear to be associated with modestly elevated expression of genomically neighbouring protein-coding genes. Our findings show that nearly half of intergenic lncRNA loci have been gained or lost since the last common ancestor of mouse and rat, and they predict that such rapid transcriptional turnover contributes to the evolution of tissue- and lineage-specific gene expression.
Project description:BACKGROUND:Long non-coding RNAs (lncRNAs) are important component of mammalian genomes, where their numbers are even larger than that of protein-coding genes. For example, human (Homo sapiens) (96,308 vs. 20,376) and mouse (Mus musculus) (87,774 vs. 22,630) have more lncRNA genes than protein-coding genes in the NONCODEv5 database. Recently, mammalian lncRNAs were reported to play critical roles in immune response to influenza A virus infections. Such observation inspired us to identify lncRNAs related to immune response to influenza A virus in duck, which is the most important natural host of influenza A viruses. RESULTS:We explored features of 62,447 lncRNAs from human, mouse, chicken, zebrafish and elegans, and developed a pipeline to identify lncRNAs using the identified features with transcriptomic data. We then collected 151,970 assembled transcripts from RNA-Seq data of 21 individuals from three tissues and annotated 4094 duck lncRNAs. Comparing to duck protein-coding transcripts, we found that 4094 lncRNAs had smaller number of exons (2.4 vs. 10.2) and longer length of transcripts (1903.0?bp vs. 1686.9?bp) on average. Among them, 3586 (87.6%) lncRNAs located in intergenic regions and 619 lncRNAs showed differential expression in ducks infected by H5N1 virus when compared to control individuals. 58 lncRNAs were involved into two co-expressional modules related to anti-influenza A virus immune response. Moreover, we confirmed that eight lncRNAs showed remarkably differential expression both in vivo (duck individuals) and in vitro (duck embryo fibroblast cells, DEF cells) after infected with H5N1 viruses, implying they might play important roles in response to influenza A virus infection. CONCLUSIONS:This study presented an example to annotate lncRNA in new species based on model species using transcriptome data. These data and analysis provide information for duck lncRNAs' function in immune response to influenza A virus.
Project description:Long noncoding RNAs (lncRNAs) are one of the most intensively studied groups of noncoding elements. Debate continues over what proportion of lncRNAs are functional or merely represent transcriptional noise. Although characterization of individual lncRNAs has identified approximately 200 functional loci across the Eukarya, general surveys have found only modest or no evidence of long-term evolutionary conservation. Although this lack of conservation suggests that most lncRNAs are nonfunctional, the possibility remains that some represent recent evolutionary innovations. We examine recent selection pressures acting on lncRNAs in mouse populations. We compare patterns of within-species nucleotide variation at approximately 10,000 lncRNA loci in a cohort of the wild house mouse, Mus musculus castaneus, with between-species nucleotide divergence from the rat (Rattus norvegicus). Loci under selective constraint are expected to show reduced nucleotide diversity and divergence. We find limited evidence of sequence conservation compared with putatively neutrally evolving ancestral repeats (ARs). Comparisons of sequence diversity and divergence between ARs, protein-coding (PC) exons and lncRNAs, and the associated flanking regions, show weak, but significantly lower levels of sequence diversity and divergence at lncRNAs compared with ARs. lncRNAs conserved deep in the vertebrate phylogeny show lower within-species sequence diversity than lncRNAs in general. A set of 74 functionally characterized lncRNAs show levels of diversity and divergence comparable to PC exons, suggesting that these lncRNAs are under substantial selective constraints. Our results suggest that, in mouse populations, most lncRNA loci evolve at rates similar to ARs, whereas older lncRNAs tend to show signals of selection similar to PC genes.
Project description:If sequencing was possible only for genomes, and not for RNAs or proteins, then functional protein-coding exons would be recognizable by their unusual patterns of nucleotide composition, specifically a high GC content across the body of exons, and an unusual nucleotide content near their edges. RNAs and proteins can, of course, be sequenced but the extent of functionality of intergenic long noncoding RNAs (lncRNAs) remains under question owing to their low nucleotide conservation. Inspired by the nucleotide composition patterns of protein-coding exons, we sought evidence for functionality across lncRNA loci from diverse species. We found that such patterns across multiexonic lncRNA loci mirror those of proteincoding genes, although to a lesser degree: Specifically, compared with introns, lncRNA exons are GC rich. Additionally we report evidence for the action of purifying selection to preserve exonic splicing enhancers within human multiexonic lncRNAs and nucleotide composition in fruit fly lncRNAs. Our findings provide evidence for selection for more efficient rates of transcription and splicing within lncRNA loci. Despite only a minor proportion of their RNA bases being constrained, multiexonic intergenic lncRNAs appear to require accurate splicing of their exons to transact their function.
Project description:We employed whole-genome RNA-sequencing to profile mRNAs and both annotated and novel long noncoding RNAs (lncRNAs) in human naive, central memory, and effector memory CD4+ T cells. Loci transcribing both lineage-specific annotated and novel lncRNA are adjacent to lineage-specific protein-coding genes in the genome. Lineage-specific novel lncRNA loci are transcribed from lineage-specific typical- and supertranscriptional enhancers and are not multiexonic, thus are more similar to enhancer RNAs. Novel enhancer-associated lncRNAs transcribed from the IFNG locus bind the transcription factor NF-?B and enhance binding of NF-?B to the IFNG genomic locus. Depletion of the annotated lncRNA, IFNG-AS1, or one IFNG enhancer-associated lncRNA abrogates IFNG expression by memory T cells, indicating these lncRNAs have biologic function.
Project description:Long non-coding RNAs (lncRNAs), transcribed from the intergenic regions of animal genomes, play important roles in key biological processes. In mice, Zdbf2linc was recently identified as an lncRNA isoform of the paternally expressed imprinted Zdbf2 gene. The functional role of Zdbf2linc remains undefined, but it may control parent-of-origin-specific expression of protein-coding neighbors through epigenetic modification in cis, similar to imprinted Nespas, Kcnq1ot1 and Airn lncRNAs. Here, we identified a novel imprinted long-range non-coding RNA, termed GPR1AS, in the human GPR1-ZDBF2 intergenic region. Although GPR1AS contains no human ZDBF2 exons, this lncRNA is transcribed in the antisense orientation from the GPR1 intron to a secondary, differentially methylated region upstream of the ZDBF2 gene (ZDBF2 DMR), similar to mouse Zdbf2linc. Interestingly, GPR1AS/Zdbf2linc is exclusively expressed in human/mouse placenta with paternal-allele-specific expression and maternal-allele-specific promoter methylation (GPR1/Gpr1 DMR). The paternal-allele specific methylation of the secondary ZDBF2 DMR was established in human placentas as well as somatic lineage. Meanwhile, the ZDBF2 gene showed stochastic paternal-allele-specific expression, possibly methylation-independent, in placental tissues. Overall, we demonstrated that epigenetic regulation mechanisms in the imprinted GPR1-GPR1AS-ZDBF2 region were well-conserved between human and mouse genomes without the high sequence conservation of the intergenic lncRNAs. Our findings also suggest that lncRNAs with highly conserved epigenetic and transcriptional regulation across species arose by divergent evolution from a common ancestor, if they do not have identical exon structures.
Project description:Mammalian genomes are pervasively transcribed to produce thousands of long non-coding RNAs (lncRNAs). A few of these lncRNAs have been shown to recruit regulatory complexes through RNA-protein interactions to influence the expression of nearby genes, and it has been suggested that many other lncRNAs can also act as local regulators. Such local functions could explain the observation that lncRNA expression is often correlated with the expression of nearby genes. However, these correlations have been challenging to dissect and could alternatively result from processes that are not mediated by the lncRNA transcripts themselves. For example, some gene promoters have been proposed to have dual functions as enhancers, and the process of transcription itself may contribute to gene regulation by recruiting activating factors or remodelling nucleosomes. Here we use genetic manipulation in mouse cell lines to dissect 12 genomic loci that produce lncRNAs and find that 5 of these loci influence the expression of a neighbouring gene in cis. Notably, none of these effects requires the specific lncRNA transcripts themselves and instead involves general processes associated with their production, including enhancer-like activity of gene promoters, the process of transcription, and the splicing of the transcript. Furthermore, such effects are not limited to lncRNA loci: we find that four out of six protein-coding loci also influence the expression of a neighbour. These results demonstrate that cross-talk among neighbouring genes is a prevalent phenomenon that can involve multiple mechanisms and cis-regulatory signals, including a role for RNA splice sites. These mechanisms may explain the function and evolution of some genomic loci that produce lncRNAs and broadly contribute to the regulation of both coding and non-coding genes.
Project description:Long non-coding RNA (lncRNA) plays an important role in many important biological processes and has attracted widespread attention. Although the precise functions and mechanisms for most lncRNAs are still unknown, we are certain that lncRNAs usually perform their functions by interacting with the corresponding RNA- binding proteins. For example, lncRNA-protein interactions play an important role in post transcriptional gene regulation, such as splicing, translation, signaling, and advances in complex diseases. However, experimental verification of lncRNA-protein interactions prediction is time-consuming and laborious. In this work, we propose a computational method, named IRWNRLPI, to find the potential associations between lncRNAs and proteins. IRWNRLPI integrates two algorithms, random walk and neighborhood regularized logistic matrix factorization, which can optimize a lot more than using an algorithm alone. Moreover, the method is semi-supervised and does not require negative samples. Based on the leave-one-out cross validation, we obtain the AUC of 0.9150 and the AUPR of 0.7138, demonstrating its reliable performance. In addition, by means of case study in the "Mus musculus," many lncRNA-protein interactions which are predicted by our method can be successfully confirmed by experiments. This suggests that IRWNRLPI will be a useful bioinformatics resource in biomedical research.
Project description:Recent advances in transcriptome sequencing have enabled the discovery of thousands of long non-coding RNAs (lncRNAs) across many species. Though several lncRNAs have been shown to play important roles in diverse biological processes, the functions and mechanisms of most lncRNAs remain unknown. Two significant obstacles lie between transcriptome sequencing and functional characterization of lncRNAs: identifying truly non-coding genes from de novo reconstructed transcriptomes, and prioritizing the hundreds of resulting putative lncRNAs for downstream experimental interrogation.We present slncky, a lncRNA discovery tool that produces a high-quality set of lncRNAs from RNA-sequencing data and further uses evolutionary constraint to prioritize lncRNAs that are likely to be functionally important. Our automated filtering pipeline is comparable to manual curation efforts and more sensitive than previously published computational approaches. Furthermore, we developed a sensitive alignment pipeline for aligning lncRNA loci and propose new evolutionary metrics relevant for analyzing sequence and transcript evolution. Our analysis reveals that evolutionary selection acts in several distinct patterns, and uncovers two notable classes of intergenic lncRNAs: one showing strong purifying selection on RNA sequence and another where constraint is restricted to the regulation but not the sequence of the transcript.Our results highlight that lncRNAs are not a homogenous class of molecules but rather a mixture of multiple functional classes with distinct biological mechanism and/or roles. Our novel comparative methods for lncRNAs reveals 233 constrained lncRNAs out of tens of thousands of currently annotated transcripts, which we make available through the slncky Evolution Browser.
Project description:Thousands of long noncoding RNAs (lncRNAs) have been annotated in eukaryotic genomes, but comparative transcriptomic approaches are necessary to understand their biological impact and evolution. To facilitate such comparative studies in Drosophila, we identified and characterized lncRNAs in a second Drosophilid-the evolutionary model Drosophila pseudoobscura Using RNA-Seq and computational filtering of protein-coding potential, we identified 1,589 intergenic lncRNA loci in D. pseudoobscura We surveyed multiple sex-specific developmental stages and found, like in Drosophila melanogaster, increasingly prolific lncRNA expression through male development and an overrepresentation of lncRNAs in the testes. Other trends seen in D. melanogaster, like reduced pupal expression, were not observed. Nonrandom distributions of female-biased and non-testis-specific male-biased lncRNAs between the X chromosome and autosomes are consistent with selection-based models of gene trafficking to optimize genomic location of sex-biased genes. The numerous testis-specific lncRNAs, however, are randomly distributed between the X and autosomes, and we cannot reject the hypothesis that many of these are likely to be spurious transcripts. Finally, using annotated lncRNAs in both species, we identified 134 putative lncRNA homologs between D. pseudoobscura and D. melanogaster and find that many have conserved developmental expression dynamics, making them ideal candidates for future functional analyses.
Project description:The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single-nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual's susceptibility to cancer.