Genomics

Dataset Information

17

DNA pooling using Affymetrix HindIII and Ilummina HumanHap arrays


ABSTRACT: The experiment was based on 3 arrays (3 Illumina HumanHap300 and 3 Affymetrix Genechip HindIII arrays) of each type being hybridized to a single pool which contained equal amounts of DNA from each of 384 individuals. The goal is to estimate a pooling allele frequency, the average frequency of allele 1, say, in the set of 384 individuals. After processing, the raw data are summarized to give pooling allele frequency estimates for each array. Abstract from paper comparing two arrays (one affy, one illumina) is as follows; Genome wide association (GWA) studies to map genes for complex traits are powerful yet costly. DNA pooling strategies have the potential to dramatically reduce the cost of GWA studies. Pooling using Affymetrix arrays has been proposed and used but the efficiency of these arrays has not been quantified. We compared and contrasted Affymetrix Genechip HindIII and Illumina HumanHap300 arrays on the same DNA pools and show that the HumanHap300 arrays are substantially more efficient. In terms of effective sample size, HumanHap300 based pooling extracts >80% of the information available with individual genotyping (IG). In contrast, Genechip HindIII based pooling only extracts ~30% of the available information. With HumanHap300 arrays concordance with IG data is excellent. Guidance is given on best study design and it is shown that even after taking into account pooling error, one stage scans can be performed for >100 fold reduced cost compared with IG. With appropriately designed two stage studies, IG can provide confirmation of pooling results whilst still providing ~20 fold reduction in total cost compared with IG based alternatives. The large cost savings with Illumina HumanHap300 based pooling imply that future studies need only be limited by the availability of samples and not cost. Keywords: DNA pooling experiment Overall design: A pool was typed using 3 Affymetrix HindIII arrays. Estimates of pooling allele frequency were obtained. Pool of 384 individuals - data in matrix is for 3 replicate arrays on this pool. Information on each array individually is in the uploaded files with raw match and mismatch scores Further details on the raw probe score data also included with this submission are below For Affymetrix arrays, there is information on Perfect Match (PM) and Mis-Match (MM) intensities for each allele. Essentially the estimate of pooling allele frequency (PAF) comes from A/(A+B) where A is PM-MM, similarly for B. From the header line of the affy file "0586-EN Control Pool Repl 1 50KHind 09-03-05_Quartet1 - PA(Sense)","0586-EN Control Pool Repl 1 50KHind 09-03-05_Quartet1 - PB(Sense)","0586-EN Control Pool Repl 1 50KHind 09-03-05_Quartet1 - MA(Sense)","0586-EN Control Pool Repl 1 50KHind 09-03-05_Quartet1 - MB(Sense)" so the order of the information is PM-(allele A) PM-(allele B) MM-(allele A) MM-(allele B) This same information is also available on the anti-sense strand of the array. Further, the sense and antisense strands have data at up to 7 locations on the array. That is, the 4 columns of PM/MM data are repeated (2*7) 14 times - so there are 56 columns of data plus the column with the snp name. In practice only 10 of the possible 14 sets of 4 columns give valid data so there are 10 sets of 4 columns that yield estimates of A/(A+B) (i.e. pooling allele frequency estimates or PAFs). These 10 PAFs are accumulated over mutiple arrays and finally used to get an overall estimate (using a statistical model described in Macgregor et al, Nucleic Acids Research, 34(7):e55, 2006) of the frequency of a particular allele in the set of pooled individuals. In our case, 3 affymetrix arrays for each pool are used so there are up to 30 PAF values used in the final calculation. In practice some of the individual PAFs don't get included as they fail on the array. Since the array data are fairly noisy, we use the ~30 PAF values to bring down the array error - essentially it is the fact that there is up to 30 fold redundancy of the array that enables the pooling to work satisfactorily. The raw probe score data for the 3 arrays is in the text files conrep1.csv, conrep2.csv and conrep3.csv The summary file, forgeoconpoolfreq.txt, contains the rs and Illumina names (Code), along with the physical position, chromosome and estimate of allele frequency for the pool based on the raw data from all 3 arrays used. The estimate of pooling allele frequency (PAF) comes from PAF=R/(R+G) where R and G are the red and green intensities respectively. The PAFs were normalized to ensure that the mean allele frequency was 0.5 over each strand of the array (i.e. over each of the 10 sets of ~30k SNPs on the array).

INSTRUMENT(S): [Mapping50K_Hind240] Affymetrix Human Mapping 50K Hind240 SNP Array

SUBMITTER: Stuart Macgregor   

PROVIDER: GSE9307 | GEO | 2007-11-06

SECONDARY ACCESSION(S): PRJNA102959

REPOSITORIES: GEO

altmetric image

Publications

Highly cost-efficient genome-wide association studies using DNA pools and dense SNP arrays.

Macgregor Stuart S   Zhao Zhen Zhen ZZ   Henders Anjali A   Nicholas Martin G MG   Montgomery Grant W GW   Visscher Peter M PM  

Nucleic acids research 20080214 6


Genome-wide association (GWA) studies to map genes for complex traits are powerful yet costly. DNA-pooling strategies have the potential to dramatically reduce the cost of GWA studies. Pooling using Affymetrix arrays has been proposed and used but the efficiency of these arrays has not been quantified. We compared and contrasted Affymetrix Genechip HindIII and Illumina HumanHap300 arrays on the same DNA pools and showed that the HumanHap300 arrays are substantially more efficient. In terms of ef  ...[more]

Similar Datasets

2009-08-08 | GSE17557 | GEO
2014-05-01 | E-GEOD-17557 | ArrayExpress
| PRJNA102959 | ENA
2012-06-26 | E-GEOD-22284 | ArrayExpress
2011-11-22 | GSE22284 | GEO
| PRJNA118657 | ENA
2014-04-24 | PXD000916 | Pride
| GSE75092 | GEO
2014-03-07 | E-MTAB-2383 | ArrayExpress
2009-06-24 | GSE16751 | GEO