Project description:The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap Projects, The Cancer Genome Atlas, and GTEx, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalogue that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes. This SuperSeries is composed of the SubSeries listed below.
Project description:The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap Projects, The Cancer Genome Atlas, and GTEx, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalogue that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes.
Project description:1. Evaluate the diagnostic value of long noncoding RNA (CCAT1) expression by RT-PCR in peripheral blood in colorectal cancer patients versus normal healthy control personal.
2. Evaluate the clinical utility of detecting long noncoding RNA (CCAT1) expression in diagnosis of colorectal cancer patients & its relation to tumor staging.
3. Evaluate the clinical utility of detecting long noncoding RNA (CCAT1) expression in precancerous colorectal diseases.
4. Compare long noncoding RNA (CCAT1) expression with traditional marker; carcinoembryonic antigen (CEA) and Carbohydrate antigen 19-9 (CA19-9) in diagnosis of colorectal cancer.
Project description:To understand the role of long non-coding RNAs and interaction with coding RNAs in bladder urothelial cell carcinoma (BUCC), we performed genome-wide screening long non-coding RNAs and coding RNAs expression on primary BUCC tissues and normal tissues using long non-coding RNA array (Agilent plateform (GPL13825). By comparing these two groups, significantly differentially expressed lncRNAs and coding RNAs were identified. We further identifed a subset of long noncoding RNAs and their correlation with neighboring coding genes using bioinformatic tools. This analysis provides foundamental understaning of transcriptomic landscape changing during bladder carcinogenesis. 12 BUCC primary tumors and 3 normal tissues were used for long noncoding RNA array experiments which including long non-coding RNAs and coding RNAs. The differential expression of subset of long noncoding RNAs and their interaction with coding RNAs in BUCC compared with normal tissue will be identified with comtational analysis.
Project description:Methyl-7-guanosine (m7G) “capping” of coding and some noncoding RNAs is critical for their maturation and subsequent activity. Here, we discovered that eukaryotic translation initiation factor 4E(eIF4E), itself a cap-binding protein, drives the expression of the capping machinery and the increased capping efficiency of ∼100 coding and noncoding RNAs. This dataset collects transcriptomic data for quantitative cap immunoprecipitation (CapIP) assay in eIF4E-Flag or vector stable U2Os cells.
Project description:Non-poly(A) RNA molecules including noncoding RNAs (ncRNAs) comprise the major portion of the total transcribed molecules in the cell. In addition to the mRNAs the ncRNAs also function as ribonucleoprotein particles (RNPs) and carry out biological functions including synthesis of new proteins, RNA processing, genome remodelling and regulation of transcription. We therefore envisaged a comprehensive transcriptome-wide identification of coding and non-coding RNA-binding proteins (RBPs) in the Leishmania spp. Towards this we applied the recently reported orthogonal organic phase separation (OOPS) method in combination with tandem mass tag (TMT) labelling-based quantitative proteomic mass spectrometry (MS) and report herein the most comprehensive identification of RBPs in Leishmania mexicana (L. mexicana) parasites. This study identified novel RNA binding property of thousands of L. mexicana proteins, significantly expanding the RBP landscape of the parasite. Furthermore, we showed that the classical Hsp90 inhibitor tanespimycin differentially regulates the RNA-binding property of hundreds of L. mexicana RBPs, shedding light into hitherto unknown large-scale downstream molecular effects of the small molecule inhibitor in the parasite.
Project description:In order to understand the role of long noncoding RNAs (lncRNAs) and their interaction with coding RNAs in esophageal sqaumous cell cancer (ESCC), we performed genome-wide screening of the expression of lncRNAs and coding RNAs from primary ESCC tissue and adjacent normal tissue using Agilent SurePrint G3 Human GE 8x60K Microarray. By comparing ESCC tissues and matched normal tissues, differentially expressed lncRNAs and coding RNAs were identified and confirmed with PCR and other independent studies. We further identified a subset of co-located and co-expressed lncRNAs and coding RNAs using bioinformatic tools and the analysis suggested that a subset of lncRNAs may influence nearby genes involved in the genesis of ESCC. Four pairs of ESCC primary tumors and adjacent normal tissues were used for genome-scale microarray experiments, which included long noncoding RNAs and coding RNAs. Selected lncRNAs expressed in the experiment were validated on independent matched-pair samples with PCR method.
Project description:Long non-coding RNAs (lncRNAs) and miRNAs have emerged as crucial regulators of gene expression and cell fate decisions. Here we present an integrated analysis of the ncRNA-landscape of purified human hematopoietic stem cells (HSCs) and their differentiated progenies, including granulocytes, monocytes, T-cells, NK-cells, B-cells, megakaryocytes and erythroid precursors. For each blood cell population, RNA from 5 healthy donors was hybridized onto three microarray platforms (Arraystar lncRNA V2.0, NCode™-miRNA/-ncRNA), yielding a coverage of more than 40,000 lncRNAs, 25,000 mRNAs and 900 miRNAs on 146 arrays. T-distributed stochastic neighbor embedding (t-SNE) on noncoding genes structured the dataset into groups of samples that matched the input populations, demonstrating their unique lncRNA expression profiles. Self-organizing maps (SOMs) revealed clusters of lncRNAs and mRNAs that were coordinately expressed in HSCs and during lineage commitment. Using a “guilt-by-association” approach we assigned putative functions to lncRNAs regulated during differentiation, which predicted LINC00173 as a novel non-coding regulator of granulopoiesis. We knocked down LINC00173 using two independent shRNA constructs, which resulted in diminished granulocytic in vitro differentiation, myeloid colony-formation and function. Next, we uncovered a strong and highly coordinated upregulation of miRNAs, small nucleolar RNAs (snoRNAs) and lncRNAs within the DLK1-DIO3 locus on chromosome 14 (hsa14) during megakaryocytic maturation. shRNA-mediated knock-down of noncoding members of the locus reduced erythroid colony-formation and megakaryocytic cell proliferation in vitro implicating the functional importance of this ncRNA locus in megakaryopoiesis.