Dataset Information


HAT: hypergeometric analysis of tiling-arrays with application to promoter-GeneChip data.

ABSTRACT: BACKGROUND: Tiling-arrays are applicable to multiple types of biological research questions. Due to its advantages (high sensitivity, resolution, unbiased), the technology is often employed in genome-wide investigations. A major challenge in the analysis of tiling-array data is to define regions-of-interest, i.e., contiguous probes with increased signal intensity (as a result of hybridization of labeled DNA) in a region. Currently, no standard criteria are available to define these regions-of-interest as there is no single probe intensity cut-off level, different regions-of-interest can contain various numbers of probes, and can vary in genomic width. Furthermore, the chromosomal distance between neighboring probes can vary across the genome among different arrays. RESULTS: We have developed Hypergeometric Analysis of Tiling-arrays (HAT), and first evaluated its performance for tiling-array datasets from a Chromatin Immunoprecipitation study on chip (ChIP-on-chip) for the identification of genome-wide DNA binding profiles of transcription factor Cebpa (used for method comparison). Using this assay, we can refine the detection of regions-of-interest by illustrating that regions detected by HAT are more highly enriched for expected motifs in comparison with an alternative detection method (MAT). Subsequently, data from a retroviral insertional mutagenesis screen were used to examine the performance of HAT among different applications of tiling-array datasets. In both studies, detected regions-of-interest have been validated with (q)PCR. CONCLUSIONS: We demonstrate that HAT has increased specificity for analysis of tiling-array data in comparison with the alternative method, and that it accurately detects regions-of-interest in two different applications of tiling-arrays. HAT has several advantages over previous methods: i) as there is no single cut-off level for probe-intensity, HAT can detect regions-of-interest at various thresholds, ii) it can detect regions-of-interest of any size, iii) it is independent of probe-resolution across the genome, and across tiling-array platforms and iv) it employs a single user defined parameter: the significance level. Regions-of-interest are detected by computing the hypergeometric-probability, while controlling the Family Wise Error. Furthermore, the method does not require experimental replicates, common regions-of-interest are indicated, a sequence-of-interest can be examined for every detected region-of-interest, and flanking genes can be reported.

SUBMITTER: Taskesen E 

PROVIDER: S-EPMC2892465 | BioStudies | 2010-01-01T00:00:00Z

REPOSITORIES: biostudies

Similar Datasets

2009-01-01 | S-EPMC2753849 | BioStudies
1000-01-01 | S-EPMC2759964 | BioStudies
2008-01-01 | S-EPMC2862456 | BioStudies
1000-01-01 | S-EPMC2530869 | BioStudies
2013-01-01 | S-EPMC3810201 | BioStudies
2008-01-01 | S-EPMC2386063 | BioStudies
2013-01-01 | S-EPMC3733988 | BioStudies
2014-02-05 | E-GEOD-46541 | BioStudies
2007-01-01 | S-EPMC1888820 | BioStudies
2010-01-01 | S-EPMC3582178 | BioStudies