Unknown

Dataset Information

0

Effective normalization for copy number variation in Hi-C data.


ABSTRACT: Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact equally with each other.In order to explore the effect of copy-number variations on Hi-C data normalization, we first propose a simulation model that predict the effects of large copy-number changes on a diploid Hi-C contact map. We then show that the standard approaches relying on equal visibility fail to correct for unwanted effects in the presence of copy-number variations. We thus propose a simple extension to matrix balancing methods that model these effects. Our approach can either retain the copy-number variation effects (LOIC) or remove them (CAIC). We show that this leads to better downstream analysis of the three-dimensional organization of rearranged genomes.Taken together, our results highlight the importance of using dedicated methods for the analysis of Hi-C cancer data. Both CAIC and LOIC methods perform well on simulated and real Hi-C data sets, each fulfilling different needs.

SUBMITTER: Servant N 

PROVIDER: S-EPMC6127909 | biostudies-other | 2018 Sep

REPOSITORIES: biostudies-other

altmetric image

Publications

Effective normalization for copy number variation in Hi-C data.

Servant Nicolas N   Varoquaux Nelle N   Heard Edith E   Barillot Emmanuel E   Vert Jean-Philippe JP  

BMC bioinformatics 20180906 1


<h4>Background</h4>Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact equally with each other.<h4>Results</h4>In order to explore t  ...[more]

Similar Datasets

| S-EPMC3481445 | biostudies-other
| S-EPMC4381046 | biostudies-literature
| S-EPMC6078171 | biostudies-literature
2016-05-24 | GSE47357 | GEO
| S-EPMC8210825 | biostudies-literature
| S-EPMC7087379 | biostudies-literature
| S-EPMC3409265 | biostudies-literature
| S-EPMC3018818 | biostudies-literature
| S-EPMC5909048 | biostudies-other