ABSTRACT: The most widely-used method for detecting and measuring genome-wide protein-DNA interactions is chromatin immunoprecipitation on tiling microarrays, commonly known as ChIP-chip. Many tiling array platforms, amplification methods, and analysis algorithms exist for ChIP-chip, but a rigorous assessment of the relative performance of these factors has not been reported. In a multi-lab simulation of a ChIP-chip experiment, we conducted the first objective analysis of tiling array platforms and analysis algorithms. We designed a complex mixture of human genomic DNA with a "spike-in" comprised of nearly 100 human sequences at various concentrations. Eight independent groups hybridized these mixtures to four different tiling array platforms. The groups were blind to the composition of the spike-in mix, the range of concentrations covered, or how many sequences it contained. Still blind to the key, each group made predictions of the spike-in locations based on their measurements. The results reveal that all commercial tiling array platforms perform well, although each platform and analysis algorithm has distinct performance characteristics. Simple sequence repeats and genome redundancy tend to result in false positives on oligonucleotide platforms. We also compare genome-wide platforms with regard to performance and cost. The spike-in DNA samples and the resulting array data presented in our study provide a stable benchmark against which future ChIP platforms, protocol improvements, and analysis methods can be evaluated. Keywords: Spike in Control  For data usage terms and conditions, please refer to http://www.genome.gov/27528022  and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf With the availability of sequenced genomes and whole-genome tiling microarrays, many researchers have conducted experiments using ChIP-chip and related methods to study genome-wide protein-DNA interactions1-9. These are powerful yet challenging techniques, which are comprised of many steps that can introduce variability in the final results. One potentially important factor is the relative performance of different types of tiling arrays. Currently the most popular platforms for performing ChIP-chip experiments are PCR-product based tiling arrays spotted in academic laboratories and commercial oligonucleotide-based tiling arrays from Affymetrix, NimbleGen, and Agilent. A second factor known to introduce variation is the DNA amplification protocol, which is often required because the low DNA yield from a ChIP experiment prevents direct detection on microarrays. A third factor is the algorithm used for detecting regions of enrichment from the tiling array data. A number of algorithms have been developed, but until this report there was no benchmark dataset to systematically evaluate them. In this study, we used a spike-in experiment to systematically evaluate the effects of tiling microarrays, amplification protocols, and data analysis algorithms on ChIP-chip results. There are other potentially important factors that from a practical standpoint are more difficult to systematically control and evaluate, and are not assessed here. These include the experimenter, the amount of starting material used, size of DNA fragments after shearing, DNA labeling method, and hybridization conditions.  There have been several studies evaluating the performance of gene expression microarrays and analysis algorithms10-13. However, because tiling arrays must cover large genomic regions, they have different probe properties and high probe densities, and therefore present different biochemical and informatics challenges. Thus the results from the expression array spike-in experiments are not necessarily directly relevant to tiling array experiments. One recent study compared the performance of array-based (ChIP-chip) and sequence-based (ChIP-PET) technologies on a real ChIP experiment14. However, because this was an exploratory experiment, the list of absolute “true positive” targets was and is unknown. Since this experiment was performed without a key, the sensitivity and specificity of each technology had to be estimated retrospectively by qPCR validation of targets predicted from each platform.  In our experiment, eight independent research groups at locations worldwide hybridized two different mixtures of DNA to one of four tiling array platforms, and predicted genome location and concentration of the spike-in sequences using a total of 13 different algorithms. Throughout the process, the research groups were entirely blind to the contents of the spike-in mixtures. Using the spike-in key, we analyzed several performance parameters for each platform, algorithm, and amplification method. While all commercial platforms performed well, we found that each had unique performance characteristics. We examined the implications of these results in planning human genome-wide experiments, in which trade-offs between probe density and cost are important.