Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Homo sapiens

ABSTRACT: A Random-Forest Based Algorithm for Prediction of Enhancers From Histone Modifications

PROVIDER: PRJNA165163 | ENA |

REPOSITORIES: ENA

ACCESS DATA

Json Xml

Similar Datasets

A Random-Forest Based Algorithm for Prediction of Enhancers From Histone Modifications

Project description:Transcriptional enhancers play critical roles in regulation of gene expression, but their identification has remained a challenge. Recently, it was shown that enhancers in the mammalian genome are associated with characteristic histone modification patterns, which have been increasingly exploited for enhancer identification. However, only a limited number of histone modifications have previously been investigated for this purpose, leaving the questions answered whether there exist an optimal set of histone modifications that could improve the enhancer prediction. Here, we address this issue by exploring a rich dataset produced by the human Epigenome Roadmap Project. Specifically, we examined genome-wide profiles of 24 histone modifications in human embryonic stem cells and fibroblasts, and developed a Random-Forest based algorithm to integrate histone modification profiles for identification of enhancers.As a training set, we used histone modification profiles at genome-wide binding sites of p300 in the two cell types identified using ChIP-seq. We show that this algorithm not only leads to more accurate and precise prediction of enhancers than previous methods, but also helps identify an optimal set of three chromatin marks for enhancer prediction.

2012-05-10 | GSE37858 | GEO

A Random-Forest Based Algorithm for Prediction of Enhancers From Histone Modifications

2012-05-09 | E-GEOD-37858 | biostudies-arrayexpress

tRForest: a novel random forest-based algorithm for tRNA-derived fragment target prediction

Project description:tRForest: a novel random forest-based algorithm for tRNA-derived fragment target prediction

| PRJNA783302 | ENA

Robust methylation based classification of brain tumors using nanopore sequencing

Project description:Using a public reference data set of 82 unique entities, 382 nanopore-sequenced brain tumor samples were classified based on their methylation status through an ad hoc random forest algorithm. As a measure of confidence, score recalibration was performed and platform-specific thresholds were defined.

2022-07-30 | GSE209865 | GEO

Liu2023 - Predicting the efficacy of immune checkpoint inhibitors monotherapy in advanced non-small cell lung cancer: a machine learning method based on multidimensional data

Project description:Immunotherapy has improved the prognosis of patients with advanced non-small cell lung cancer (NSCLC), but only a small subset of patients achieved clinical benefit. The purpose of our study was to integrate multidimensional data using a machine learning method to predict the therapeutic efficacy of immune checkpoint inhibitors (ICIs) monotherapy in patients with advanced NSCLC.The authors retrospectively enrolled 112 patients with stage IIIB-IV NSCLC receiving ICIs monotherapy. The random forest (RF) algorithm was used to establish efficacy prediction models based on five different input datasets, including precontrast computed tomography (CT) radiomic data, postcontrast CT radiomic data, combination of the two CT radiomic data, clinical data, and a combination of radiomic and clinical data. The 5-fold cross-validation was used to train and test the random forest classifier. The performance of the models was assessed according to the area under the curve (AUC) in the receiver operating characteristic (ROC) curve. Among these models(RF MLP LR XGBoost), our reproduced onnx models have better performance, especially for random forest. The response variable with a value (1/0) indicates the (efficacy/inefficacy) of PD-1/PD-L1 monotherapy in patients with advanced NSCLC

2023-07-11 | BIOMD0000001074 | BioModels

Deshpande2019 - Random Forest model to predict long non-coding RNAs from coding RNAs in Zea Mays plant transcriptomic data

Project description:This is a Random Forest algorithm-based machine learning model to predict lncRNAs from coding mRNAs in plant transcriptomic data. The model assigns 1 for coding sequences and 2 for long non-coding sequences. The prediction is performed using a combination of Open Reading Frame (ORF) based, Sequence-based and Codon-bias features. Users need to download the curated ONNX model and also need to convert the sequences into feature matrix as mentioned in PLIT paper (Deshpande et al. 2019) to make predictions on sequences from Zea Mays sequence data.

2023-05-22 | BIOMD0000001067 | BioModels

Project description:A Random Forest Algorithm Based on a cgMLST Scheme to Predict hvKP

| PRJEB34922 | ENA

Transcription profiling of acute lymphoblastic leukaemia patient samples that represent six different subgroups defined by cytogenetic features and immunophenotype

Project description:We examined published microarray data from 104 acute lymphoblastic leukaemia patient specimens, that represent six different subgroups defined by cytogenetic features and immunophenotypes. Using the decision-tree based supervised learning algorithm Random Forest (RF), we determined a small set of genes for optimal subgroup distinction and subsequently validated their predictive power in an independent cohort of 68 specimens that were assessed using Affymetrix HG-U133A arrays.

2007-05-11 | E-TABM-125 | biostudies-arrayexpress

Patterson2022 - Tumour mutation data driven Random Forest model to predict immune checkpoint inhibitor therapy benefit in metastatic melanoma

Project description:A Random Forest model is developed to incorporate tumor mutation data within the context of the biological process known as leukocyte proliferation regulation. This model aims to predict a patient's response to anti-PD1 treatment. The authors conducted experiments using four different types of classifiers: Random Forest, Gradient Boosting, Feed Forward Neural Network, and Long Short-Term Memory (LSTM) recurrent neural network. Among these classifiers, the Random Forest algorithm yielded the best predictive performance when modeling gene mutation data associated with the 'leukocyte proliferation regulation' biological process. Hence, this curated version of the model focuses on the Random Forest model trained specifically on the 'Leukocyte Proliferation Regulation' process. In this model, a value of '0' is assigned to NonResponders, while a value of '1' is assigned to Responders. Please note that to obtain predictions, users should provide mutation data containing only the genes corresponding to the 'GO_REGULATION_OF_LEUKOCYTE_PROLIFERATION' process keyword, as specified in the 'GO_test_genes_dict_intersection' dictionary.

2023-07-03 | BIOMD0000001073 | BioModels

Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy

Project description:Top-down mass spectrometry (MS) is a powerful tool for identification and comprehensive characterization of proteoforms arising from alternative splicing, sequence variation, and post-translational modifications. While the technique is powerful, it suffered from the complex dataset generated from top-down MS experiments, which requires sequential data processing steps for data interpretation. Deconvolution of the complex isotopic distribution that arises from naturally occurring isotopes is a critical step in the data processing process. Multiple algorithms are currently available to deconvolute top-down mass spectra; however, each algorithm generates different deconvoluted peak lists with varied accuracy comparing to true positive annotations. In this study, we have designed a machine learning strategy that can process and combine the peak lists from different deconvolution results. By optimizing clustering results, deconvolution results from THRASH, TopFD, MS-Deconv, and SNAP algorithms were combined into consensus peak lists at various thresholds using either a simple voting ensemble method or a random forest machine learning algorithm. The random forest model outperformed the single best algorithm. This machine learning strategy could enhance the accuracy and confidence in protein identification during database search by accelerating detection of true positive peaks while filtering out false positive peaks. Thus, this method showed promises in enhancing proteoform identification and characterization for high-throughput data analysis in top-down proteomics.

2020-05-06 | PXD018043 | Pride

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data