Project description:Illumina Infinium whole genome genotyping (WGG) arrays are increasingly being applied in cancer genomics to study gene copy number alterations and allele-specific aberrations such as loss-of-heterozygosity (LOH). Methods developed for normalization of WGG arrays have mostly focused on diploid, normal samples. However, for cancer samples genomic aberrations may confound normalization and data interpretation. Therefore, we examined the effects of the conventionally used normalization method for Illumina Infinium arrays when applied to cancer samples. We demonstrate an asymmetry in the detection of the two alleles for each SNP, which deleteriously influences both allelic proportions and copy number estimates. The asymmetry is caused by a remaining bias between the two dyes used in the Infinium II assay after using the normalization method in Illumina’s proprietary software (BeadStudio). We propose a quantile normalization strategy for correction of this dye bias. We tested the normalization strategy using 535 individual hybridizations from 10 data sets from the analysis of cancer genomes and normal blood samples generated on Illumina Infinium II 300k version 1 and 2, 370k and 550k BeadChips. We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers. Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations. The proposed normalization strategy represents a valuable low-level analysis tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies.
Project description:Illumina Infinium whole genome genotyping (WGG) arrays are increasingly being applied in cancer genomics to study gene copy number alterations and allele-specific aberrations such as loss-of-heterozygosity (LOH). Methods developed for normalization of WGG arrays have mostly focused on diploid, normal samples. However, for cancer samples genomic aberrations may confound normalization and data interpretation. Therefore, we examined the effects of the conventionally used normalization method for Illumina Infinium arrays when applied to cancer samples. We demonstrate an asymmetry in the detection of the two alleles for each SNP, which deleteriously influences both allelic proportions and copy number estimates. The asymmetry is caused by a remaining bias between the two dyes used in the Infinium II assay after using the normalization method in Illumina’s proprietary software (BeadStudio). We propose a quantile normalization strategy for correction of this dye bias. We tested the normalization strategy using 535 individual hybridizations from 10 data sets from the analysis of cancer genomes and normal blood samples generated on Illumina Infinium II 300k version 1 and 2, 370k and 550k BeadChips. We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers. Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations. The proposed normalization strategy represents a valuable low-level analysis tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies. To investigate the effects of a quantile normalization of Illumina Infinium data, compared to conventional normalization using BeadStudio (www.illumina.com), we renormalized 535 individual hybridizations conducted on Illumina 300K, 370K and 550K BeadChips. Sample types included breast cancer, colon cancer, urothelial carcinoma, leukemia as well as normal blood and HapMap samples. This series includes the 6 breast cancers hybridized on Illumina HumanHap 550K BeadChips.
Project description:Copy number profiling of 36 ovarian tumors on Affymetrix 100K SNP arrays Thirty-six ovarian tumors were profiled for copy-number alterations with the Affymetrix 100K Mapping Array. Copy number profiling of 36 ovarian tumors on Affymetrix 500K SNP arrays Sixteen ovary tumors were profiled for copy-number alterations with the high-resolution Affymetrix 500K Mapping Array. Affymetrix 100K Mapping Array intensity signal CEL files were processed by dChip 2005 (Build date Nov 30, 2005) using the PM/MM difference model and invariant set normalization. Each probe set was mapped to the genome, NCBI assembly version 36, using annotation provided by the Affymetrix web site. The log2 ratios were centered to a median of zero and segmented using the GLAD package for the R statistical environment. Copy number was calculated as power(2,log2ratio + 1). Affymetrix 500K Mapping Array intensity signal CEL files were processed by dChip 2005 (Build date Nov 30, 2005) using the PM/MM difference model and invariant set normalization. Forty-eight normal samples were downloaded from the Affymetrix website (http://www.affymetrix.com/support/technical/byproduct.affx?product=500k) and analyzed at the same time. One CEL file for each set (Sty and Nsp) with the median signal intensity across the set was selected as the reference array. The dChip-normalized signal intensities were converted to log2 ratios and segmented as follows. For each autosomal probe set, the log2 tumor/normal ratio of each tumor sample was calculated using the average intensity for each probe set in the normal set. For Chromosome X, the average of the 20 normal female samples was used. Each probe set was mapped to the genome, NCBI assembly version 36, using annotation provided by the Affymetrix web site. The log2 ratios were centered to a median of zero and segmented using the GLAD package for the R statistical environment. Copy number was calculated as power(2,log2ratio + 1).
Project description:Development of a clinically relevant animal models of RCC for preclinical investigations. For DNA copy number analysis, the Sty I (250K) SNP array of the 500K Human Mapping Array (Affymetrix) was used. Arrays were scanned by GeneChip Scanner 3000 7G. Probe-level signal intensities were normalized to a baseline array with median intensity using invariant set normalization and SNP-level signal intensities were obtained using a model-based (PM/MM) method. Keywords: SNP array data, renal cell carcinoma
Project description:SNP arrays were used to derive copy number estimates and identify amplifications and deletions in melanomas These copy number breakpoints were compared to gene fusions identified by second generation sequencing of cDNA
Project description:We performed Illumina Infinium whole-genome SNP-CN profiling of KMS11, MM.1S, and RPMI8226 multiple myeloma cell lines to detect gene copy number variants distinct to each cell line
Project description:We describe a method for automatic detection of absolute segmental copy numbers and genotype status in complex cancer genome profiles measured by SNP arrays. The method is based on pattern recognition of segmented and smoothed copy number and allelic imbalance profiles. Overall copy number assignments were verified by DNA indexes of breast carcinomas and karyotypes of cell lines. The method performs well even for poor quality data, low tumor content, and highly rearranged tumor genomes.
Project description:Copy number variants (CNVs) can reach appreciable frequencies in the human population, and several of these copy number polymorphisms (CNPs) have been recently associated with human diseases including lupus, psoriasis, Crohn disease, and obesity. Despite new advances, significant biases remain in terms of CNP discovery and genotyping. Developing a novel method based on single channel intensity data and benchmarking against copy numbers determined from sequencing read-depth, we successfully obtained CNP genotypes for 1489 CNPs from 487 human DNA samples from diverse ethnic backgrounds. This customized microarray was enriched for segmental duplication-rich regions and novel insertions of sequences not represented in the reference genome assembly or on standard single nucleotide polymorphism (SNP) microarray platforms. We observe that CNPs in segmental duplications are more likely to be population differentiated than CNPs in unique regions (p = 0.015) and that bi-allelic CNPs show greater stratification when compared to frequency-matched SNPs (p = 0.0026). Although bi-allelic CNPs show a strong correlation of copy number with flanking SNP genotypes, the majority of multi-copy CNPs do not (40% with r >0.8). We selected a subset of CNPs for further characterization in 1873 additional samples from 62 populations (947 samples analyzed by microarray; 926 samples analyzed with PCR based assays); this revealed striking population-differentiated structural variants in genes of clinical significance such as the OCLN gene, a tight junction protein involved in hepatitis C viral entry. Our new microarray design allows these variants to be rapidly tested for disease association and our results suggest that CNPs (especially those that are not in linkage disequilibrium with SNPs) may have contributed disproportionately to human diversity and selection.