Project description:The human transcriptome consists of various RNA biotypes including multiple types of non-coding RNAs (ncRNAs). Current ncRNA compendia remain incomplete partially because they are almost exclusively derived from the interrogation of small- and polyadenylated RNAs. Here, we present a more comprehensive atlas of the human transcriptome that is derived from matching polyA-, total-, and small-RNA profiles of a heterogenous collection of nearly 300 human tissues and cell lines. We report thousands of novel RNA species across all major RNA biotypes, including a hitherto poorly-cataloged class of non-polyadenylated single-exon long non-coding RNAs. In addition, we exploit intron abundance estimates from total RNA-sequencing to predict the regulatory potential of various non-coding RNAs. Our study represents a substantial expansion of the current catalogue of human ncRNAs and their regulatory interactions. All data and results are accessible through the R2 webtool and serve as a basis to further explore RNA biology and function.
Project description:Interventions: Case series:Nil
Primary outcome(s): intestinal microecological disorders;blood non-coding RNAs and immune status
Study Design: Randomized parallel controlled trial
Project description:Small RNAs, including microRNAs (miRNAs), phased secondary small interfering RNAs (phasiRNA), and heterochromatic small interfering RNAs (hc-siRNA) are an essential component of gene regulation. To establish a broad potato small RNA atlas, we constructed an expression atlas of leaves, flowers, roots, and tubers of Desiree and Eva, which are commercially important potato (Solanum tuberosum) cultivars. All small RNAs identified were observed to be conserved between both cultivars, supporting the hypothesis that small RNAs have a low evolutionary rate and are mostly conserved between lineages. However, we also found that a few miRNAs showed differential accumulation between the two potato cultivars, and that hc-siRNAs have a tissue specific expression. We further identified dozens of reproductive and non-reproductive phasiRNAs originating from coding and noncoding regions that appeared to exhibit tissue-specific expression. Together, this study provides an extensive small RNA profiling of different potato tissues that might be used as a resource for future investigations.
Project description:Using RNA CaptureSeq we annotated non-coding RNAs transcribed from genome intervals surrounding breast cancer risk signals in a range of mammary-derived tissue and cell lines.
Project description:Long non-coding RNAs (lncRNAs) are defined as non-protein-coding transcripts that are at least 200 nucleotides long. They are known to play pivotal roles in regulating gene expression, especially during stress responses in plants. We used a large collection of in-house transcriptome data from various soybean (Glycine max and Glycine soja) tissues treated under different conditions to perform a comprehensive identification of soybean lncRNAs. We also retrieved publicly available soybean transcriptome data that were of sufficient quality and sequencing depth to enrich our analysis. In total, RNA-seq data of 332 samples were used for this analysis. An integrated reference-based, de novo transcript assembly was developed that identified ~69,000 lncRNA gene loci. We showed that lncRNAs are distinct from both protein-coding transcripts and genomic background noise in terms of length, number of exons, transposable element composition, and sequence conservation level across legume species. The tissue-specific and time-specific transcriptional responses of the lncRNA genes under some stress conditions may suggest their biological relevance. The transcription start sites of lncRNA gene loci tend to be close to their nearest protein-coding genes, and they may be transcriptionally related to the protein-coding genes, particularly for antisense and intronic lncRNAs. A previously unreported subset of small peptide-coding transcripts was identified from these lncRNA loci via tandem mass spectrometry, which paved the way for investigating their functional roles. Our results also highlight the current inadequacy of the bioinformatic definition of lncRNA, which excludes those lncRNA gene loci with small open reading frames (ORFs) from being regarded as protein-coding.
Project description:Leishmania major is a kinetoplastid protozoan parasite which causes the debilitating infectious disease cutaneous leishmaniasis (CL). This disease results in scars and disfiguration of the infected individuals. The L. major genome was the first leishmanial genome to be sequenced in 2005 and this study resulted in the identification of 8,300 protein coding genes. This landmark study paved the way for further sequencing of other leishmanial parasites (L. infantum, L. braziliensis and L. donovani). A recent study provided the glimpse of the global transcriptome of L. major promastigotes. This study identified 1,884 uniquely expressed non-coding RNAs (ncRNA) in L. major. Additionally, we had previously mapped the global proteome of L. major promastigote using a proteogenomic approach which resulted in identification of 3,613 proteins in L. major promastigotes which covered 43% of its proteome. In the present study, we have carried out extensive analysis of the 1,884 novel ncRNAs using a proteogenomic approach to identify their protein coding potential. Our analysis resulted in identification of 10 novel protein coding genes based on peptide data and additional hundreds of proteins coding genes based on homology searches of previously classified ncRNA genes. We have analyzed each of these novel protein coding genes and in the process have improved the genome annotation of L. major on the basis of mass spectrometry derived peptide data and also based on homology.