Project description:The clinical course of prostate cancer (PCa) is highly variable, demanding an individualized approach to therapy and robust prognostic markers for treatment decisions. We here present a random forest-based classification model to predict aggressive behaviour of PCa. DNA methylation changes between PCa cases with good or poor prognosis (discovery cohort with n=78) were used as input. The model was validated with data from two independent PCa cohorts from ICGC and TCGA. Ranking of cancer progression-related DNA methylation changes allowed selection of candidate genes for additional validation by immunohistochemistry. We identified loss of ZIC2 protein expression as a promising novel prognostic biomarker for PCa in >12,000 tissue micro-array tumors. The prognostic value of ZIC2 proved to be independent from established clinico-pathological variables including Gleason, stage, nodal stage and PSA. In summary, we have developed a PCa classification model which either directly or via expression analyses of the identified top ranked candidate genes might help in decision making related to the treatment of prostate cancer patients.
Project description:Transcriptional enhancers play critical roles in regulation of gene expression, but their identification has remained a challenge. Recently, it was shown that enhancers in the mammalian genome are associated with characteristic histone modification patterns, which have been increasingly exploited for enhancer identification. However, only a limited number of histone modifications have previously been investigated for this purpose, leaving the questions answered whether there exist an optimal set of histone modifications that could improve the enhancer prediction. Here, we address this issue by exploring a rich dataset produced by the human Epigenome Roadmap Project. Specifically, we examined genome-wide profiles of 24 histone modifications in human embryonic stem cells and fibroblasts, and developed a Random-Forest based algorithm to integrate histone modification profiles for identification of enhancers.As a training set, we used histone modification profiles at genome-wide binding sites of p300 in the two cell types identified using ChIP-seq. We show that this algorithm not only leads to more accurate and precise prediction of enhancers than previous methods, but also helps identify an optimal set of three chromatin marks for enhancer prediction.
Project description:Transcriptional enhancers play critical roles in regulation of gene expression, but their identification has remained a challenge. Recently, it was shown that enhancers in the mammalian genome are associated with characteristic histone modification patterns, which have been increasingly exploited for enhancer identification. However, only a limited number of histone modifications have previously been investigated for this purpose, leaving the questions answered whether there exist an optimal set of histone modifications that could improve the enhancer prediction. Here, we address this issue by exploring a rich dataset produced by the human Epigenome Roadmap Project. Specifically, we examined genome-wide profiles of 24 histone modifications in human embryonic stem cells and fibroblasts, and developed a Random-Forest based algorithm to integrate histone modification profiles for identification of enhancers.As a training set, we used histone modification profiles at genome-wide binding sites of p300 in the two cell types identified using ChIP-seq. We show that this algorithm not only leads to more accurate and precise prediction of enhancers than previous methods, but also helps identify an optimal set of three chromatin marks for enhancer prediction. ChIP-Seq Analysis of p300 in hESC H1 and IMR90 cells. Sequencing was done on the Illumina Genome Analyzer II platform for the H1 data and Illumina HiSeq for IMR90.Data was mapped to hg18 using Bowtie.
Project description:tRNA fragments (tRFs) are a novel class of small RNAs comparable to the size and function of miRNAs. We and others have shown that tRFs are generally Dicer independent, can be found in abundance in the miRNA effector protein Ago, and can repress expression of specific genes that have complementarity to their 5’ seed-sequences. Given that this greatly expands the repertoire of small RNAs capable of post-transcriptional gene expression, it is important to predict tRF targets with confidence. Some attempts have been made to predict tRF targets, but are limited in the scope of tRF classes used in prediction or limited in feature selection. We hypothesized that established miRNA target prediction features applied to tRFs through a random forest machine learning algorithm will immensely improve tRF target prediction. Using this approach, we show significant improvements in tRF target prediction for all classes of tRFs and validate our predictions in two independent cell lines. Finally, using Gene Ontology analysis, we provide evidence that tRF-3009a targets may be involved in neural development. These improvements to tRF target prediction further our understanding of tRF function broadly across species and tRF types, and provide avenues for testing novel roles for tRFs in biology.