Project description:BackgroundMicroarray measurements are susceptible to a variety of experimental artifacts, some of which give rise to systematic biases that are spatially dependent in a unique way on each chip. It is likely that such artifacts affect many SNP arrays, but the normalization methods used in currently available genotyping algorithms make no attempt at spatial bias correction. Here, we propose an effective single-chip spatial bias removal procedure for Affymetrix 6.0 SNP arrays or platforms with similar design features. This procedure deals with both extreme and subtle biases and is intended to be applied before standard genotype calling algorithms.ResultsApplication of the spatial bias adjustments on HapMap samples resulted in higher genotype call rates with equal or even better accuracy for thousands of SNPs. Consequently the normalization procedure is expected to lead to more meaningful biological inferences and could be valuable for genome-wide SNP analysis.ConclusionsSpatial normalization can potentially rescue thousands of SNPs in a genetic study at the small cost of computational time. The approach is implemented in R and available from the authors upon request.
Project description:Illumina Infinium whole genome genotyping (WGG) arrays are increasingly being applied in cancer genomics to study gene copy number alterations and allele-specific aberrations such as loss-of-heterozygosity (LOH). Methods developed for normalization of WGG arrays have mostly focused on diploid, normal samples. However, for cancer samples genomic aberrations may confound normalization and data interpretation. Therefore, we examined the effects of the conventionally used normalization method for Illumina Infinium arrays when applied to cancer samples. We demonstrate an asymmetry in the detection of the two alleles for each SNP, which deleteriously influences both allelic proportions and copy number estimates. The asymmetry is caused by a remaining bias between the two dyes used in the Infinium II assay after using the normalization method in Illumina’s proprietary software (BeadStudio). We propose a quantile normalization strategy for correction of this dye bias. We tested the normalization strategy using 535 individual hybridizations from 10 data sets from the analysis of cancer genomes and normal blood samples generated on Illumina Infinium II 300k version 1 and 2, 370k and 550k BeadChips. We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers. Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations. The proposed normalization strategy represents a valuable low-level analysis tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies. To investigate the effects of a quantile normalization of Illumina Infinium data, compared to conventional normalization using BeadStudio (www.illumina.com), we renormalized 535 individual hybridizations conducted on Illumina 300K, 370K and 550K BeadChips. Sample types included breast cancer, colon cancer, urothelial carcinoma, leukemia as well as normal blood and HapMap samples. This series includes the 6 breast cancers hybridized on Illumina HumanHap 550K BeadChips.
Project description:SummaryWe present a tool for control-free copy number alteration (CNA) detection using deep-sequencing data, particularly useful for cancer studies. The tool deals with two frequent problems in the analysis of cancer deep-sequencing data: absence of control sample and possible polyploidy of cancer cells. FREEC (control-FREE Copy number caller) automatically normalizes and segments copy number profiles (CNPs) and calls CNAs. If ploidy is known, FREEC assigns absolute copy number to each predicted CNA. To normalize raw CNPs, the user can provide a control dataset if available; otherwise GC content is used. We demonstrate that for Illumina single-end, mate-pair or paired-end sequencing, GC-contentr normalization provides smooth profiles that can be further segmented and analyzed in order to predict CNAs.AvailabilitySource code and sample data are available at http://bioinfo-out.curie.fr/projects/freec/.
Project description:Illumina Infinium whole genome genotyping (WGG) arrays are increasingly being applied in cancer genomics to study gene copy number alterations and allele-specific aberrations such as loss-of-heterozygosity (LOH). Methods developed for normalization of WGG arrays have mostly focused on diploid, normal samples. However, for cancer samples genomic aberrations may confound normalization and data interpretation. Therefore, we examined the effects of the conventionally used normalization method for Illumina Infinium arrays when applied to cancer samples. We demonstrate an asymmetry in the detection of the two alleles for each SNP, which deleteriously influences both allelic proportions and copy number estimates. The asymmetry is caused by a remaining bias between the two dyes used in the Infinium II assay after using the normalization method in Illumina’s proprietary software (BeadStudio). We propose a quantile normalization strategy for correction of this dye bias. We tested the normalization strategy using 535 individual hybridizations from 10 data sets from the analysis of cancer genomes and normal blood samples generated on Illumina Infinium II 300k version 1 and 2, 370k and 550k BeadChips. We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers. Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations. The proposed normalization strategy represents a valuable low-level analysis tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies.
Project description:Recurring genetic abnormalities have been identified in Philadelphia chromosome (Ph)-positive acute lymphoblastic leukemia (ALL). Among them, IKZF1 deletion was associated with poor prognosis in patients treated with imatinib-based or dasatinib-based regimens. However, the molecular determinants for clinical outcomes in ponatinib-treated patients remain unknown. We systematically analyzed genetic alterations in adults with Ph-positive ALL uniformly treated in clinical trials with dasatinib-based regimens or a ponatinib-based regimen and investigated the molecular determinants for treatment outcomes using pretreatment specimens collected from adults with Ph-positive ALL treated with Hyper-CVAD plus dasatinib or ponatinib. DNA sequencing and SNP microarray were performed and recurrent genetic abnormalities were found in 84% of the patients, among whom IKZF1 deletion was most frequently detected (60%). IKZF1 deletion frequently co-occurred with other copy-number abnormalities (IKZF1plus, 46%) and was significantly associated with unfavorable overall survival (OS) (false discovery rate < 0.1) and increased cumulative incidence of relapse (p = 0.01). In a multivariate analysis, dasatinib therapy, lack of achievement of 3-month complete molecular response, and the presence of IKZF1plus status were significantly associated with poor OS. The differential impact of IKZF1plus was largely restricted to patients given Hyper-CVAD plus ponatinib; dasatinib-based regimens had unfavorable outcomes regardless of the molecular abnormalities.
Project description:Recurring genetic abnormalities have been identified in Philadelphia chromosome (Ph)-positive acute lymphoblastic leukemia (ALL). Among them, IKZF1 deletion was associated with poor prognosis in patients treated with imatinib-based or dasatinib-based regimens. However, the molecular determinants for clinical outcomes in ponatinib-treated patients remain unknown. We systematically analyzed genetic alterations in adults with Ph-positive ALL uniformly treated in clinical trials with dasatinib-based regimens or a ponatinib-based regimen and investigated the molecular determinants for treatment outcomes using pretreatment specimens collected from adults with Ph-positive ALL treated with Hyper-CVAD plus dasatinib or ponatinib. DNA sequencing and SNP microarray were performed and recurrent genetic abnormalities were found in 84% of the patients, among whom IKZF1 deletion was most frequently detected (60%). IKZF1 deletion frequently co-occurred with other copy-number abnormalities (IKZF1plus, 46%) and was significantly associated with unfavorable overall survival (OS) (false discovery rate < 0.1) and increased cumulative incidence of relapse (p = 0.01). In a multivariate analysis, dasatinib therapy, lack of achievement of 3-month complete molecular response, and the presence of IKZF1plus status were significantly associated with poor OS. The differential impact of IKZF1plus was largely restricted to patients given Hyper-CVAD plus ponatinib; dasatinib-based regimens had unfavorable outcomes regardless of the molecular abnormalities.
Project description:CYP2D6 is a very important pharmacogene as it is responsible for the metabolization or bioactivation of 20 to 30% of the clinically used drugs. However, despite its relatively small length of only 4.4 kb, it is one of the most challenging pharmacogenes to genotype due to the high similarity with its neighboring pseudogenes and the frequent occurrence of CYP2D6-CYP2D7 hybrids. Unfortunately, most current genotyping methods are therefore not able to correctly determine the complete CYP2D6-CYP2D7 sequence. Therefore, we developed a genotyping assay to generate complete allele-specific consensus sequences of complex regions by optimizing the PCR-free nanopore Cas9-targeted sequencing (nCATS) method combined with adaptive sequencing, and developing a new comprehensive long read genotyping (CoLoRGen) pipeline. The CoLoRGen pipeline first generates consensus sequences of both alleles and subsequently determines both large structural and small variants to ultimately assign the correct star-alleles. In reference samples, our genotyping assay confirms the presence of CYP2D6-CYP2D7 large structural variants, single nucleotide variants (SNVs), and small insertions and deletions (INDELs) that go undetected by most current assays. Moreover, our results provide direct evidence that the CYP2D6 genotype of the NA12878 DNA should be updated to include the CYP2D6-CYP2D7 *68 hybrid and several additional single nucleotide variants compared to existing references. Ultimately, the nCATS-CoLoRGen genotyping assay additionally allows for more accurate gene function predictions by enabling the possibility to detect and phase de novo mutations in addition to known large structural and small variants.
Project description:BACKGROUND: Illumina Infinium whole genome genotyping (WGG) arrays are increasingly being applied in cancer genomics to study gene copy number alterations and allele-specific aberrations such as loss-of-heterozygosity (LOH). Methods developed for normalization of WGG arrays have mostly focused on diploid, normal samples. However, for cancer samples genomic aberrations may confound normalization and data interpretation. Therefore, we examined the effects of the conventionally used normalization method for Illumina Infinium arrays when applied to cancer samples. RESULTS: We demonstrate an asymmetry in the detection of the two alleles for each SNP, which deleteriously influences both allelic proportions and copy number estimates. The asymmetry is caused by a remaining bias between the two dyes used in the Infinium II assay after using the normalization method in Illumina's proprietary software (BeadStudio). We propose a quantile normalization strategy for correction of this dye bias. We tested the normalization strategy using 535 individual hybridizations from 10 data sets from the analysis of cancer genomes and normal blood samples generated on Illumina Infinium II 300 k version 1 and 2, 370 k and 550 k BeadChips. We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers. Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations. CONCLUSION: The proposed normalization strategy represents a valuable tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies.
Project description:Researchers are increasingly turning to label-free MS1 intensity-based quantification strategies within HPLC-ESI-MS/MS workflows to reveal biological variation at the molecule level. Unfortunately, HPLC-ESI-MS/MS workflows using these strategies produce results with poor repeatability and reproducibility, primarily due to systematic bias and complex variability. While current global normalization strategies can mitigate systematic bias, they fail when faced with complex variability stemming from transient stochastic events during HPLC-ESI-MS/MS analysis. To address these problems, we developed a novel local normalization method, proximity-based intensity normalization (PIN), based on the analysis of compositional data. We evaluated PIN against common normalization strategies. PIN outperforms them in dramatically reducing variance and in identifying 20% more proteins with statistically significant abundance differences that other strategies missed. Our results show the PIN enables the discovery of statistically significant biological variation that otherwise is falsely reported or missed.
Project description:Recent advances in multiplexed imaging technologies promise to improve the understanding of the functional states of individual cells and the interactions between the cells in tissues. This often requires compilation of results from multiple samples. However, quantitative integration of information between samples is complicated by variations in staining intensity and background fluorescence that obscure biological variations. Failure to remove these unwanted artifacts will complicate downstream analysis and diminish the value of multiplexed imaging for clinical applications. Here, to compensate for unwanted variations, we automatically identify negative control cells for each marker within the same tissue and use their expression levels to infer background signal level. The intensity profile is normalized by the inferred level of the negative control cells to remove between-sample variation. Using a tissue microarray data and a pair of longitudinal biopsy samples, we demonstrated that the proposed approach can remove unwanted variations effectively and shows robust performance.