Evolutionary analysis across mammals reveals distinct classes of long noncoding RNAs
ABSTRACT: Recent advances in transcriptome sequencing have enabled the discovery of thousands of long non-coding RNAs (lncRNAs) across many species. Though several lncRNAs have been shown to play important roles in diverse biological processes, the functions and mechanisms of most lncRNAs remain unknown. Two significant obstacles lie between transcriptome sequencing and functional characterization of lncRNAs: identifying truly non-coding genes from de novo reconstructed transcriptomes, and prioritizing the hundreds of resulting putative lncRNAs for downstream experimental interrogation. We present slncky, a computational lncRNA discovery tool that produces a high-quality set of lncRNAs from RNA-sequencing data and further uses evolutionary constraint to prioritize lncRNAs that are likely to be functionally important. Our automated filtering pipeline is comparable to manual curation efforts and more sensitive than previously published computational approaches. Furthermore, we develop a sensitive alignment pipeline for aligning lncRNA loci and propose new evolutionary metrics relevant for analyzing sequence and transcript evolution. Our analysis reveals that evolutionary selection acts in several distinct patterns, and uncovers two notable classes of intergenic lncRNAs: one showing strong purifying selection on RNA sequence and another where constraint is restricted to the regulation but not the sequence of the transcript. To study a comprehensive and comparable set of lncRNAs expressed in the pluripotent state, we analyzed RNA-Seq data from pluripotent cells derived from several strains and species, and grown in two physiological conditions. First we derived “naïve” ES cells (ESCs) from each of three different mice strains: 129SvEv, NOD, and Mus musculus castaneus (cast) mouse, a wild mouse subspecies originally from Thailand, as well as naïve induced pluripotent stem (iPS) cells from rat and human. Next, to facilitate comparisons across pluripotent cells from different species, we also cultured mouse and rat cells in “primed” epiblast stem cell (EpiSC) culture conditions, since human iPS cells in culture are much more similar molecularly and functionally to mouse primed EpiSCs rather than to a ground state naïve ESCs. We polyA selected RNA from each cell type and deeply sequenced on HiSeq2500
Project description:Although the functional roles of long noncoding RNAs (lncRNAs) have been increasingly identified, few lncRNAs that control the naïve state of embryonic stem cells (ESCs) are known. Here, we report a naïve-state-associated lncRNA, LincU, which is intrinsically activated by Nanog in mESCs. LincU-deficient mESCs exhibit a primed-like pluripotent state and potentiate the transition from the naïve state to the primed state, whereas ectopic LincU expression maintains mESCs in the naïve state. Mechanistically, we demonstrate that LincU binds and stabilizes the DUSP9 protein, an ERK-specific phosphatase, and then constitutively inhibits the ERK1/2 signaling pathway, which critically contributes to maintenance of the naïve state. Importantly, we reveal the functional role of LincU to be evolutionarily conserved in human. Therefore, our findings unveil LincU as a conserved lncRNA that intrinsically restricts MAPK/ERK activity and maintains the naïve state of ESCs.
Project description:Only a minuscule fraction of long non-coding RNAs (lncRNAs) are well characterized. The evolutionary history of lncRNAs can provide insights into their functionality, but comparative analyses have been precluded by our ignorance of lncRNAs in non-model organisms. Here, we use RNA sequencing to identify lncRNAs in eleven tetrapod species and we present the first large-scale evolutionary study of lncRNA repertoires and expression patterns. We identify ~11,000 primate- specific lncRNA families, which show evidence for selective constraint during recent evolution, and ~2,400 highly conserved lncRNAs (including ~400 genes that likely originated more than 300 million years ago). We find that lncRNAs, in particular ancient ones, are generally actively regulated and may predominantly function in embryonic development. lncRNA X-inactivation patterns reveal an extremely female-biased monotreme-specific lncRNA, which may partially compensate X-dosage in this lineage. Most lncRNAs evolve rapidly in terms of sequence and expression levels, but global patterns like tissue specificities are often conserved. We compared expression patterns of homologous lncRNA and protein-coding families across tetrapods to reconstruct an evolutionarily conserved co-expression network. This network, which surprisingly contains many lncRNA hubs, suggests potential functions for lncRNAs in fundamental processes like spermatogenesis or synaptic transmission, but also in more specific mechanisms such as placenta growth suppression through miRNA production. [Batch 1 and 2] To broaden our understanding of lncRNA evolution, we used an extensive RNA-seq dataset to establish lncRNA repertoires and homologous gene families in 11 tetrapod species. We analyzed the poly- adenylated transcriptomes of 8 organs (cortex/whole brain without cerebellum, cerebellum, heart, kidney, liver, placenta, ovary and testis) and 11 species (human, chimpanzee, bonobo, gorilla, orangutan, macaque, mouse, opossum, platypus, chicken and the frog Xenopus tropicalis), which shared a common ancestor ~370 millions of years (MY) ago. Our dataset included 47 strand-specific samples, which allowed us to confirm the orientation of gene predictions and to address the evolution of sense-antisense transcripts. See also GSE43721 (Soumillon et al, Cell Reports, 2013) for three strand-specific samples for mouse brain, liver and testis.
Project description:Recent advances in transcriptome sequencing have enabled the discovery of thousands of long non-coding RNAs (lncRNAs) across many species. Though several lncRNAs have been shown to play important roles in diverse biological processes, the functions and mechanisms of most lncRNAs remain unknown. Two significant obstacles lie between transcriptome sequencing and functional characterization of lncRNAs: identifying truly non-coding genes from de novo reconstructed transcriptomes, and prioritizing the hundreds of resulting putative lncRNAs for downstream experimental interrogation.We present slncky, a lncRNA discovery tool that produces a high-quality set of lncRNAs from RNA-sequencing data and further uses evolutionary constraint to prioritize lncRNAs that are likely to be functionally important. Our automated filtering pipeline is comparable to manual curation efforts and more sensitive than previously published computational approaches. Furthermore, we developed a sensitive alignment pipeline for aligning lncRNA loci and propose new evolutionary metrics relevant for analyzing sequence and transcript evolution. Our analysis reveals that evolutionary selection acts in several distinct patterns, and uncovers two notable classes of intergenic lncRNAs: one showing strong purifying selection on RNA sequence and another where constraint is restricted to the regulation but not the sequence of the transcript.Our results highlight that lncRNAs are not a homogenous class of molecules but rather a mixture of multiple functional classes with distinct biological mechanism and/or roles. Our novel comparative methods for lncRNAs reveals 233 constrained lncRNAs out of tens of thousands of currently annotated transcripts, which we make available through the slncky Evolution Browser.
Project description:Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (~30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non-TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ~30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ~35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires.
Project description:BACKGROUND:Human naïve pluripotency state cells can be derived from direct isolation of inner cell mass or primed-to-naïve resetting of human embryonic stem cells (hESCs) through different combinations of transcription factors, small molecular inhibitors, and growth factors. Long noncoding RNAs (lncRNAs) have been identified to be crucial in diverse biological processes, including pluripotency regulatory circuit of mouse pluripotent stem cells (PSCs), but few are involved in human PSCs' regulation of pluripotency and naïve pluripotency derivation. This study initially planned to discover more lncRNAs possibly playing significant roles in the regulation of human PSCs' pluripotency, but accidently identified a lncRNA whose knockdown in human PSCs induced naïve-like pluripotency conversion. METHODS:Candidate lncRNAs tightly correlated with human pluripotency were screened from 55 RNA-seq data containing human ESC, human induced pluripotent stem cell (iPSC), and somatic tissue samples. Then loss-of-function experiments in human PSCs were performed to investigate the function of these candidate lncRNAs. The naïve-like pluripotency conversion caused by CCDC144NL-AS1 knockdown (KD) was characterized by quantitative real-time PCR, immunofluorescence staining, western blotting, differentiation of hESCs in vitro and in vivo, RNA-seq, and chromatin immunoprecipitation. Finally, the signaling pathways in CCDC144NL-AS1-KD human PSCs were examined through western blotting and analysis of RNA-seq data. RESULTS:The results indicated that knockdown of CCDC144NL-AS1 induces naïve-like state conversion of human PSCs in the absence of additional transcription factors or small molecular inhibitors. CCDC144NL-AS1-KD human PSCs reveal naïve-like pluripotency features, such as elevated expression of naïve pluripotency-associated genes, increased developmental capacity, analogous transcriptional profiles to human naïve PSCs, and global reduction of repressive chromatin modification marks. Furthermore, CCDC144NL-AS1-KD human PSCs display inhibition of MAPK (ERK), accumulation of active β-catenin, and upregulation of some LIF/STAT3 target genes, and all of these are concordant with previously reported traits of human naïve PSCs. CONCLUSIONS:Our study unveils an unexpected role of a lncRNA, CCDC144NL-AS1, in the naïve-like state conversion of human PSCs, providing a new perspective to further understand the regulation process of human early pluripotency states conversion. It is suggested that CCDC144NL-AS1 can be potentially valuable for future research on deriving higher quality naïve state human PSCs and promoting their therapeutic applications.
Project description:Long noncoding RNAs (lncRNAs) are one of the most intensively studied groups of noncoding elements. Debate continues over what proportion of lncRNAs are functional or merely represent transcriptional noise. Although characterization of individual lncRNAs has identified approximately 200 functional loci across the Eukarya, general surveys have found only modest or no evidence of long-term evolutionary conservation. Although this lack of conservation suggests that most lncRNAs are nonfunctional, the possibility remains that some represent recent evolutionary innovations. We examine recent selection pressures acting on lncRNAs in mouse populations. We compare patterns of within-species nucleotide variation at approximately 10,000 lncRNA loci in a cohort of the wild house mouse, Mus musculus castaneus, with between-species nucleotide divergence from the rat (Rattus norvegicus). Loci under selective constraint are expected to show reduced nucleotide diversity and divergence. We find limited evidence of sequence conservation compared with putatively neutrally evolving ancestral repeats (ARs). Comparisons of sequence diversity and divergence between ARs, protein-coding (PC) exons and lncRNAs, and the associated flanking regions, show weak, but significantly lower levels of sequence diversity and divergence at lncRNAs compared with ARs. lncRNAs conserved deep in the vertebrate phylogeny show lower within-species sequence diversity than lncRNAs in general. A set of 74 functionally characterized lncRNAs show levels of diversity and divergence comparable to PC exons, suggesting that these lncRNAs are under substantial selective constraints. Our results suggest that, in mouse populations, most lncRNA loci evolve at rates similar to ARs, whereas older lncRNAs tend to show signals of selection similar to PC genes.
Project description:The oocyte-to-embryo transition (OET) transforms a differentiated gamete into pluripotent blastomeres. The accompanying maternal-zygotic RNA exchange involves remodeling of the long non-coding RNA (lncRNA) pool. Here, we used next generation sequencing and de novo transcript assembly to define the core population of 1,600 lncRNAs expressed during the OET (lncRNAs). Relative to mRNAs, OET lncRNAs were less expressed and had shorter transcripts, mainly due to fewer exons and shorter 5' terminal exons. Approximately half of OET lncRNA promoters originated in retrotransposons suggesting their recent emergence. Except for a small group of ubiquitous lncRNAs, maternal and zygotic lncRNAs formed two distinct populations. The bulk of maternal lncRNAs was degraded before the zygotic genome activation. Interestingly, maternal lncRNAs seemed to undergo cytoplasmic polyadenylation observed for dormant mRNAs. We also identified lncRNAs giving rise to trans-acting short interfering RNAs, which represent a novel lncRNA category. Altogether, we defined the core OET lncRNA transcriptome and characterized its remodeling during early development. Our results are consistent with the notion that rapidly evolving lncRNAs constitute signatures of cells-of-origin while a minority plays an active role in control of gene expression across OET. Our data presented here provide an excellent source for further OET lncRNA studies.
Project description:Although long noncoding RNAs (lncRNAs) are proposed to play essential roles in mammalian neurodevelopment, we know little of their functions from their disruption in vivo. Combining evidence for evolutionary constraint and conserved expression data, we previously identified candidate lncRNAs that might play important and conserved roles in brain function. Here, we demonstrate that the sequence and neuronal transcription of lncRNAs transcribed from the previously uncharacterized Visc locus are conserved across diverse mammals. Consequently, one of these lncRNAs, Visc-2, was selected for targeted deletion in the mouse, and knockout animals were subjected to an extremely detailed anatomical and behavioral characterization. Despite a neurodevelopmental expression pattern of Visc-2 that is highly localized to the cortex and sites of neurogenesis, anomalies in neither cytoarchitecture nor neuroproliferation were identified in knockout mice. In addition, no abnormal motor, sensory, anxiety, or cognitive behavioral phenotypes were observed. These results are important because they contribute to a growing body of evidence that lncRNA loci contribute on average far less to brain and biological functions than protein-coding loci. A high-throughput knockout program focussing on lncRNAs, similar to that currently underway for protein-coding genes, will be required to establish the distribution of their organismal functions.
Project description:Maintenance of the pluripotent state or differentiation of the pluripotent state into any germ layer depends on the factors that orchestrate expression of thousands of genes through epigenetic, transcriptional, and post-transcriptional regulation. Long noncoding RNAs (lncRNAs) are implicated in the complex molecular circuitry in the developmental processes. The ENCODE project has opened up new avenues for studying these lncRNA transcripts with the availability of new datasets for lncRNA annotation and regulation. Expression studies identified hundreds of long noncoding RNAs differentially expressed in the pluripotent state, and many of these lncRNAs are found to control the pluripotency and stemness in embryonic and induced pluripotent stem cells or, in the reverse way, promote differentiation of pluripotent cells. They are generally transcriptionally activated or repressed by pluripotency-associated transcription factors and function as molecular mediators of gene expression that determine the pluripotent state of the cell. They can act as molecular scaffolds or guides for the chromatin-modifying complexes to direct them to bind into specific genomic loci to impart a repressive or activating effect on gene expression, or they can transcriptionally or post-transcriptionally regulate gene expression by diverse molecular mechanisms. This review focuses on recent findings on the regulatory role of lncRNAs in two main aspects of pluripotency, namely, self renewal and differentiation into any lineage, and elucidates the underlying molecular mechanisms that are being uncovered lately.
Project description:The human genome produces thousands of long noncoding RNAs (lncRNAs)-transcripts >200 nucleotides long that do not encode proteins. Although critical roles in normal biology and disease have been revealed for a subset of lncRNAs, the function of the vast majority remains untested. We developed a CRISPR interference (CRISPRi) platform targeting 16,401 lncRNA loci in seven diverse cell lines, including six transformed cell lines and human induced pluripotent stem cells (iPSCs). Large-scale screening identified 499 lncRNA loci required for robust cellular growth, of which 89% showed growth-modifying function exclusively in one cell type. We further found that lncRNA knockdown can perturb complex transcriptional networks in a cell type-specific manner. These data underscore the functional importance and cell type specificity of many lncRNAs.