We need your help! If you've ever found our data helpful, please take our impact survey (15 min). Your replies will help keep the data flowing to the scientific community. Please Click here for Survey

Unknown

Dataset Information

0

Important biological information uncovered in previously unaligned reads from chromatin immunoprecipitation experiments (ChIP-Seq).


ABSTRACT: Establishing the architecture of gene regulatory networks (GRNs) relies on chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) methods that provide genome-wide transcription factor binding sites (TFBSs). ChIP-Seq furnishes millions of short reads that, after alignment, describe the genome-wide binding sites of a particular TF. However, in all organisms investigated an average of 40% of reads fail to align to the corresponding genome, with some datasets having as much as 80% of reads failing to align. We describe here the provenance of previously unaligned reads in ChIP-Seq experiments from animals and plants. We show that a substantial portion corresponds to sequences of bacterial and metazoan origin, irrespective of the ChIP-Seq chromatin source. Unforeseen was the finding that 30%-40% of unaligned reads were actually alignable. To validate these observations, we investigated the characteristics of the previously unaligned reads corresponding to TAL1, a human TF involved in lineage specification of hemopoietic cells. We show that, while unmapped ChIP-Seq read datasets contain foreign DNA sequences, additional TFBSs can be identified from the previously unaligned ChIP-Seq reads. Our results indicate that the re-evaluation of previously unaligned reads from ChIP-Seq experiments will significantly contribute to TF target identification and determination of emerging properties of GRNs.

SUBMITTER: Ouma WZ 

PROVIDER: S-EPMC4345404 | BioStudies | 2015-01-01

REPOSITORIES: biostudies

Similar Datasets

2019-01-01 | S-EPMC7459848 | BioStudies
1000-01-01 | S-EPMC2638147 | BioStudies
1000-01-01 | S-EPMC3747862 | BioStudies
1000-01-01 | S-EPMC3764009 | BioStudies
1000-01-01 | S-EPMC3287483 | BioStudies
2014-01-01 | S-EPMC4082612 | BioStudies
1000-01-01 | S-EPMC4234207 | BioStudies
2020-01-01 | S-EPMC7144904 | BioStudies
2018-01-01 | S-EPMC5880248 | BioStudies
1000-01-01 | S-EPMC2822526 | BioStudies