Genome-wide identification and functional prediction of novel and drought-responsive lincRNAs in Populus trichocarpa.
ABSTRACT: Protein-coding genes are considered to be a dominant component of the eukaryotic transcriptome; however, many studies have shown that intergenic, non-coding transcripts also play an important role. Long intergenic non-coding RNAs (lincRNAs) were found to play a vital role in human and Arabidopsis. However, lincRNAs and their regulatory roles remain poorly characterized in woody plants, especially Populus trichocarpa (P. trichocarpa). A large set of Populus RNA-Seq data were examined with high sequencing depth under control and drought conditions and a total of 2542 lincRNA candidates were identified. In total, 51 lincRNAs and 20 lincRNAs were identified as putative targets and target mimics of known Populus miRNAs, respectively. A total of 504 lincRNAs were found to be drought responsive, eight of which were confirmed by RT-qPCR. These findings provide a comprehensive view of Populus lincRNAs, which will enable in-depth functional analysis.
Project description:Populus trichocarpa is an important woody model organism whose entire genome has been sequenced. This resource has facilitated the annotation of microRNAs (miRNAs), which are short non-coding RNAs with critical regulatory functions. However, despite their developmental importance, P. trichocarpa miRNAs have yet to be annotated from numerous important tissues. Here we significantly expand the breadth of tissue sampling and sequencing depth for miRNA annotation in P. trichocarpa using high-throughput smallRNA (sRNA) sequencing. miRNA annotation was performed using three individual next-generation sRNA sequencing runs from separate leaves, xylem, and mechanically treated xylem, as well as a fourth run using a pooled sample containing vegetative apices, male flowers, female flowers, female apical buds, and male apical and lateral buds. A total of 276 miRNAs were identified from these datasets, including 155 previously unannotated miRNAs, most of which are P. trichocarpa specific. Importantly, we identified several xylem-enriched miRNAs predicted to target genes known to be important in secondary growth, including the critical reaction wood enzyme xyloglucan endo-transglycosylase/hydrolase and vascular-related transcription factors. This study provides a thorough genome-wide annotation of miRNAs in P. trichocarpa through deep sRNA sequencing from diverse tissue sets. Our data significantly expands the P. trichocarpa miRNA repertoire, which will facilitate a broad range of research in this major model system.
Project description:The regulatory roles of long intergenic noncoding RNAs (lincRNAs) in humans have been revealed through the use of advanced sequencing technology. Recently, three possible scenarios of lincRNA origins have been proposed: de novo origination from intergenic regions, duplication from other long noncoding RNAs, and pseudogenization from protein-coding genes. The first two scenarios are largely studied and supported, yet few studies focused on the evolution from pseudogenized protein-coding sequence to lincRNA. Due to the non-mutually exclusive nature of these three scenarios and the need of systematic investigation of lincRNA origination, we conducted a comparative genomics study to investigate the evolution of human lincRNAs.Combining with syntenic analysis and stringent Blastn e-value cutoff, we found that the majority of lincRNAs are aligned to intergenic regions of other species. Interestingly, 193 human lincRNAs could have protein-coding orthologs in at least two of nine vertebrates. Transposable elements in these conserved regions in human genome are much less than expectation. Moreover, 19% of these lincRNAs have overlaps with or are close to pseudogenes in the human genome.We suggest that a notable portion of lincRNAs could be derived from pseudogenized protein-coding genes. Furthermore, based on our computational analysis, we hypothesize that a subset of these lincRNAs could have potential to regulate their paralogs by functioning as competing endogenous RNAs. Our results provide evolutionary evidence of the relationship between human lincRNAs and protein-coding genes.
Project description:BACKGROUND: High-throughput re-sequencing is rapidly becoming the method of choice for studies of neutral and adaptive processes in natural populations across taxa. As re-sequencing the genome of large numbers of samples is still cost-prohibitive in many cases, methods for genome complexity reduction have been developed in attempts to capture most ecologically-relevant genetic variation. One of these approaches is sequence capture, in which oligonucleotide baits specific to genomic regions of interest are synthesized and used to retrieve and sequence those regions. RESULTS: We used sequence capture to re-sequence most predicted exons, their upstream regulatory regions, as well as numerous random genomic intervals in a panel of 48 genotypes of the angiosperm tree Populus trichocarpa (black cottonwood, or 'poplar'). A total of 20.76Mb (5%) of the poplar genome was targeted, corresponding to 173,040 baits. With 12 indexed samples run in each of four lanes on an Illumina HiSeq instrument (2x100 paired-end), 86.8% of the bait regions were on average sequenced at a depth ?10X. Few off-target regions (>250bp away from any bait) were present in the data, but on average ~80bp on either side of the baits were captured and sequenced to an acceptable depth (?10X) to call heterozygous SNPs. Nucleotide diversity estimates within and adjacent to protein-coding genes were similar to those previously reported in Populus spp., while intergenic regions had higher values consistent with a relaxation of selection. CONCLUSIONS: Our results illustrate the efficiency and utility of sequence capture for re-sequencing highly heterozygous tree genomes, and suggest design considerations to optimize the use of baits in future studies.
Project description:We report on the genome-wide distribution pattern of histone H3 lysine 9 acetylation (H3K9ac) and the pattern’s association with whole genome expression profiles in Populus trichocarpa subjected to soil-water depletion. We identified a set of drought responsive genes whose expression is directly regulated by differential modification of H3K9ac. Overall design: We treated 3-month-old Populus trichocarpa plants by no watering. Plants were under a mild drought state without watering after 5 days (Day5) and under a severe drought state after 7 days (Day7), and plants fully irrigated were as a control (Day0). Differential H3K9ac enrichments under mild and severe drought treatments were examined in stem differentiating xylem (SDX) of the Populus trichocarpa plants.
Project description:Long intergenic noncoding RNAs (lincRNAs) are endogenous non-coding RNAs (ncRNAs) that are transcribed from 'intergenic' regions of the genome and may play critical roles in regulating gene expression through multiple RNA-mediated mechanisms. MicroRNAs (miRNAs) are single-stranded small ncRNAs of approximately 21-24 nucleotide (nt) that are involved in transcriptional and post-transcriptional gene regulation. While miRNAs functioning as mRNA repressors have been studied in detail, the influence of miRNAs on lincRNAs has seldom been investigated in plants.LincRNAs as miRNA targets or decoys were predicted via GSTAr.pl script with a set of rules, and lincRNAs as miRNA targets were validated by degradome data. Conservation analysis of lincRNAs as miRNA targets or decoys were conducted using BLASTN and MAFFT. The function of lincRNAs as miRNA targets were predicted via a lincRNA-mRNA co-expression network, and the function of lincRNAs as miRNA decoys were predicted according to the competing endogenous RNA (ceRNA) hypothesis.In this work, we developed a computational method and systematically predicted 466 lincRNAs as 165 miRNA targets and 86 lincRNAs as 58 miRNA decoys in maize (Zea mays L.). Furthermore, 34 lincRNAs predicted as 33 miRNA targets were validated based on degradome data. We found that lincRNAs acting as miRNA targets or decoys are a common phenomenon, which indicates that the regulated networks of miRNAs also involve lincRNAs. To elucidate the function of lincRNAs, we reconstructed a miRNA-regulated network involving 78 miRNAs, 117 lincRNAs and 8834 mRNAs. Based on the lincRNA-mRNA co-expression network and the competing endogenous RNA hypothesis, we predicted that 34 lincRNAs that function as miRNA targets and 86 lincRNAs that function as miRNA decoys participate in cellular and metabolic processes, and play role in catalytic activity and molecular binding functions.This work provides a comprehensive view of miRNA-regulated networks and indicates that lincRNAs can participate in a layer of regulatory interactions as miRNA targets or decoys in plants, which will enable in-depth functional analysis of lincRNAs.
Project description:Long intergenic non-coding RNAs (lincRNAs) are non-coding transcripts >200 nucleotides long that do not overlap protein-coding sequences. Importantly, such elements are known to be tissue-specifically expressed and to play a widespread role in gene regulation across thousands of genomic loci. However, very little is known of the mechanisms for the evolutionary biogenesis of these RNA elements, especially given their poor conservation across species. It has been proposed that lincRNAs might arise from pseudogenes. To test this systematically, we developed a novel method that searches for remnants of protein-coding sequences within lincRNA transcripts; the hypothesis is that we can trace back their biogenesis from protein-coding genes or posterior transposon/retrotransposon insertions. Applying this method, we found 203 human lincRNA genes with regions significantly similar to protein-coding sequences. Our method provides a visualization tool to trace the evolutionary biogenesis of lincRNAs with respect to protein-coding genes by sequence divergence. Subsequently, we show the expression correlation between lincRNAs and their identified parental protein-coding genes using public RNA-seq repositories, hinting at novel gene regulatory relationships. In summary, we developed a novel computational methodology to study non-coding gene sequences, which can be applied to identify the evolutionary biogenesis and function of lincRNAs.
Project description:Cell signaling events triggered by androgen hormone in prostate cells is dependent on activation of the androgen receptor (AR) transcription factor. Androgen hormone binding to AR promotes its displacement from the cytoplasm to the nucleus and AR binding to DNA motifs, thus inducing activatory and inhibitory transcriptional programs through a complex regulatory mechanism not yet fully understood. In this work, we performed RNA-seq deep-sequencing of LNCaP prostate cancer cells and found over 7000 expressed long intergenic non-coding RNAs (lincRNAs), of which ?4000 are novel lincRNAs, and 258 lincRNAs have their expression activated by androgen. Immunoprecipitation of AR, followed by large-scale sequencing of co-immunoprecipitated RNAs (RIP-Seq) has identified in the LNCaP cell line a total of 619 lincRNAs that were significantly enriched (FDR < 10%, DESeq2) in the anti-Androgen Receptor (antiAR) fraction in relation to the control fraction (non-specific IgG), and we named them Androgen-Receptor-Associated lincRNAs (ARA-lincRNAs). A genome-wide analysis showed that protein-coding gene neighbors to ARA-lincRNAs had a significantly higher androgen-induced change in expression than protein-coding genes neighboring lincRNAs not associated to AR. To find relevant epigenetic signatures enriched at the ARA-lincRNAs' transcription start sites (TSSs) we used a machine learning approach and identified that the ARA-lincRNA genomic loci in LNCaP cells are significantly enriched with epigenetic marks that are characteristic of in cis enhancer RNA regulators, and that the H3K27ac mark of active enhancers is conspicuously enriched at the TSS of ARA-lincRNAs adjacent to androgen-activated protein-coding genes. In addition, LNCaP topologically associating domains (TADs) that comprise chromatin regions with ARA-lincRNAs exhibit transcription factor contents, epigenetic marks and gene transcriptional activities that are significantly different from TADs not containing ARA-lincRNAs. This work highlights the possible involvement of hundreds of lincRNAs working in synergy with the AR on the genome-wide androgen-induced gene regulatory program in prostate cells.
Project description:Advances in transcriptomics have led to the discovery of a large number of long intergenic non-coding RNAs (lincRNAs), which are now recognized as important regulators of diverse cellular processes. Although originally thought to be non-coding, recent studies have revealed that many lincRNAs are bound by ribosomes, with a few lincRNAs even having ability to generate micropeptides. The question arises: how widespread the translation of lincRNAs may be and whether such translation is likely to be functional. To better understand biological relevance of lincRNA translation, we systematically characterized lincRNAs with ribosome occupancy by the expression, structural, sequence, evolutionary and functional features for eight human cell lines, revealed that lincRNAs with ribosome occupancy have remarkably distinctive properties compared with those without ribosome occupancy, indicating that translation has important biological implication in categorizing and annotating lincRNAs. Further analysis revealed lincRNAs exhibit remarkable cell-type specificity with differential translational repertoires and substantial discordance in functionality. Collectively, our analyses provide the first attempt to characterize global and cell-type specific properties of translation of lincRNAs in human cells, highlighting that translation of lincRNAs has clear molecular, evolutionary and functional implications. This study will facilitate better understanding of the diverse functions of lincRNAs.
Project description:Although thousands of large intergenic non-coding RNAs (lincRNAs) have been identified in mammals, few have been functionally characterized, leading to debate about their biological role. To address this, we performed loss-of-function studies on most lincRNAs expressed in mouse embryonic stem (ES) cells and characterized the effects on gene expression. Here we show that knockdown of lincRNAs has major consequences on gene expression patterns, comparable to knockdown of well-known ES cell regulators. Notably, lincRNAs primarily affect gene expression in trans. Knockdown of dozens of lincRNAs causes either exit from the pluripotent state or upregulation of lineage commitment programs. We integrate lincRNAs into the molecular circuitry of ES cells and show that lincRNA genes are regulated by key transcription factors and that lincRNA transcripts bind to multiple chromatin regulatory proteins to affect shared gene expression programs. Together, the results demonstrate that lincRNAs have key roles in the circuitry controlling ES cell state.
Project description:As the regulatory factors, lncRNAs play critical roles in embryonic stem cells. And lincRNAs are most widely studied lncRNAs, however, there might still might exist a large member of uncovered lncRNAs. In this study, we constructed the de novo assembly of transcriptome to detect 6,701 putative long intergenic non-coding transcripts (lincRNAs) expressed in mouse embryonic stem cells (ESCs), which might be incomplete with the lack coverage of 5' ends assessed by CAGE peaks. Comparing the TSS proximal regions between the known lincRNAs and their closet protein coding transcripts, our results revealed that the lincRNA TSS proximal regions are associated with the characteristic genomic and epigenetic features. Subsequently, 1,293 lincRNAs were corrected at their 5' ends using the putative lincRNA TSS regions predicted by the TSS proximal region prediction model based on genomic and epigenetic features. Finally, 43 putative lincRNAs were annotated by Gene Ontology terms. In conclusion, this work provides a novel catalog of mouse ESCs-expressed lincRNAs with the relatively complete transcript length, which might be useful for the investigation of transcriptional and post-transcriptional regulation of lincRNA in mouse ESCs and even mammalian development.