Dataset Information


A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays.

ABSTRACT: In recent years, the dramatic increase in the number of applications for massively parallel reporter assay (MPRA) technology has produced a large body of data for various purposes. However, a computational model that can be applied to decipher regulatory codes for diverse MPRAs does not exist yet. Here, we propose a new computational method to predict the transcriptional activity of MPRAs, as well as luciferase reporter assays, based on the TRANScription FACtor database. We employed regression trees and multivariate adaptive regression splines to obtain these predictions and considered a feature redundancy-dependent formula for conventional regression trees to enable adaptation to diverse data. The developed method was applicable to various MPRAs despite the use of different types of transfected cells, sequence lengths, construct numbers and sequence types. We demonstrate that this method can predict the transcriptional activity of promoters in HEK293 cells through predictive functions that were estimated by independent assays in eight tumor cell lines. The prediction was generally good (Pearson's r = 0.68) which suggested that common active transcription factor binding sites across different cell types make greater contributions to transcriptional activity and that known promoter activity could confer transcriptional activity of unknown promoters in some instances, regardless of cell type.

PROVIDER: S-EPMC5737609 | BioStudies |

REPOSITORIES: biostudies

Similar Datasets

| S-EPMC8092006 | BioStudies
| S-EPMC5125825 | BioStudies
| S-EPMC6576758 | BioStudies
| S-EPMC6717970 | BioStudies
| S-EPMC5389540 | BioStudies
| S-EPMC7727316 | BioStudies
| S-EPMC7550205 | BioStudies
2019-01-01 | S-EPMC6771677 | BioStudies
| S-EPMC6049023 | BioStudies
| S-EPMC6417258 | BioStudies