Probe-level estimation improves the detection of differential splicing in Affymetrix exon array studies.
ABSTRACT: The recent advent of exon microarrays has made it possible to reveal differences in alternative splicing events on a global scale. We introduce a novel statistical procedure that takes full advantage of the probe-level information on Affymetrix exon arrays when detecting differential splicing between sample groups. In comparison to existing ranking methods, the procedure shows superior reproducibility and accuracy in distinguishing true biological findings from background noise in high agreement with experimental validations.
Project description:BACKGROUND: Alternative splicing is an important mechanism that increases protein diversity and functionality in higher eukaryotes. Affymetrix exon arrays are a commercialized platform used to detect alternative splicing on a genome-wide scale. Two probe summarization algorithms, PLIER (Probe Logarithmic Intensity Error) and RMA (Robust Multichip Average), are commonly used to compute gene-level and exon-level expression values. However, a systematic comparison of these two algorithms on their effects on high-level analysis of the arrays has not yet been reported. RESULTS: In this study, we showed that PLIER summarization led to over-estimation of gene-level expression changes, relative to exon-level expression changes, in two-group comparisons. Consequently, it led to detection of substantially more skipped exons on up-regulated genes, as well as substantially more included (i.e., non-skipped) exons on down-regulated genes. In contrast, this bias was not observed for RMA-summarized data. By using a published human tissue dataset, we compared the tissue-specific expression and splicing detected by Affymetrix exon arrays with those detected based on expressed sequence databases. We found the tendency of PLIER was not supported by the expressed sequence data. CONCLUSION: We showed that the tendency of PLIER in detection of alternative splicing is likely caused by a technical bias in the approach, rather than a biological bias. Moreover, we observed abnormal summarization results when using the PLIER algorithm, indicating that mathematical problems, such as numerical instability, may affect PLIER performance.
Project description:The presence of different transcripts of a gene across samples can be analysed by whole-transcriptome microarrays. Reproducing results from published microarray data represents a challenge owing to the vast amounts of data and the large variety of preprocessing and filtering steps used before the actual analysis is carried out. To guarantee a firm basis for methodological development where results with new methods are compared with previous results, it is crucial to ensure that all analyses are completely reproducible for other researchers. We here give a detailed workflow on how to perform reproducible analysis of the GeneChip®Human Exon 1.0 ST Array at probe and probeset level solely in R/Bioconductor, choosing packages based on their simplicity of use. To exemplify the use of the proposed workflow, we analyse differential splicing and differential gene expression in a publicly available dataset using various statistical methods. We believe this study will provide other researchers with an easy way of accessing gene expression data at different annotation levels and with the sufficient details needed for developing their own tools for reproducible analysis of the GeneChip®Human Exon 1.0 ST Array.
Project description:<h4>Background</h4>There is great current interest in developing microarray platforms for measuring mRNA abundance at both gene level and exon level. The Affymetrix Exon Array is a new high-density gene expression microarray platform, with over six million probes targeting all annotated and predicted exons in a genome. An important question for the analysis of exon array data is how to compute overall gene expression indexes. Because of the complexity of the design of exon array probes, this problem is different in nature from summarizing gene-level expression from traditional 3' expression arrays.<h4>Methodology/principal findings</h4>In this manuscript, we use exon array data from 11 human tissues to study methods for computing gene-level expression. We showed that for most genes there is a subset of exon array probes having highly correlated intensities across multiple samples. We suggest that these probes could be used as reliable indicators of overall gene expression levels. We developed a probe selection algorithm to select such a subset of highly correlated probes for each gene, and computed gene expression indexes using the selected probes.<h4>Conclusions/significance</h4>Our results demonstrate that probe selection improves gene expression estimates from exon arrays. The selected probes can be used in future analyses of other exon array datasets to compute gene expression indexes.
Project description:BACKGROUND:With the advent of the GeneChip Exon Arrays, it is now possible to extract "exon-level" expression estimates, allowing for detection of alternative splicing events, one of the primary mechanisms of transcript diversity. In the context of (1) a complex trait use case and (2) a human cerebellum vs. heart comparison on previously validated data, we present a transcript-based statistical model and validation framework to allow detection of alternative exon usage (AEU) between different groups. To illustrate the approach, we detect and confirm differences in exon usage in the two of the most widely studied mouse genetic models (the C57BL/6J and DBA/2J inbred strains) and in a human dataset. RESULTS:We developed a computational framework that consists of probe level annotation mapping and statistical modeling to detect putative AEU events, as well as visualization and alignment with known splice events. We show a dramatic improvement (?25 fold) in the ability to detect these events using the appropriate annotation and statistical model which is actually specified at the transcript level, as compared with the transcript cluster/gene-level annotation used on the array. An additional component of this workflow is a probe index that allows ranking AEU candidates for validation and can aid in identification of false positives due to single nucleotide polymorphisms. DISCUSSION:Our work highlights the importance of concordance between the functional unit interrogated (e.g., gene, transcripts) and the entity (e.g., exon, probeset) within the statistical model. The framework we present is broadly applicable to other platforms (including RNAseq).
Project description:BACKGROUND: Alternative splicing and isoform level expression profiling is an emerging field of interest within genomics. Splicing sensitive microarrays, with probes targeted to individual exons or exon-junctions, are becoming increasingly popular as a tool capable of both expression profiling and finer scale isoform detection. Despite their intuitive appeal, relatively little is known about the performance of such tools, particularly in comparison with more traditional 3' targeted microarrays. Here, we use the well studied Microarray Quality Control (MAQC) dataset to benchmark the Affymetrix Exon Array, and compare it to two other popular platforms: Illumina, and Affymetrix U133. RESULTS: We show that at the gene expression level, the Exon Array performs comparably with the two 3' targeted platforms. However, the interplatform correlation of the results is slightly lower than between the two 3' arrays. We show that some of the discrepancies stem from the RNA amplification protocols, e.g. the Exon Array is able to detect expression of non-polyadenylated transcripts. More importantly, we show that many other differences result from the ability of the Exon Array to monitor more detailed isoform-level changes; several examples illustrate that changes detected by the 3' platforms are actually isoform variations, and that the nature of these variations can be resolved using Exon Array data. Finally, we show how the Exon Array can be used to detect alternative isoform differences, such as alternative splicing, transcript termination, and alternative promoter usage. We discuss the possible pitfalls and false positives resulting from isoform-level analysis. CONCLUSION: The Exon Array is a valuable tool that can be used to profile gene expression while providing important additional information regarding the types of gene isoforms that are expressed and variable. However, analysis of alternative splicing requires much more hands on effort and visualization of results in order to correctly interpret the data, and generally results in considerably higher false positive rates than expression analysis. One of the main sources of error in the MAQC dataset is variation in amplification efficiency across transcripts, most likely caused by joint effects of elevated GC content in the 5' ends of genes and reduced likelihood of random-primed first strand synthesis in the 3' ends of genes. These effects are currently not adequately corrected using existing statistical methods. We outline approaches to reduce such errors by filtering out potentially problematic data.
Project description:In higher eukaryotes, alternative splicing is a common mechanism for increasing transcriptome diversity. Affymetrix exon arrays were designed as a tool for monitoring the relative expression levels of hundreds of thousands of known and predicted exons with a view to detecting alternative splicing events. In this article, we have analyzed exon array data from many different human and mouse tissues and have uncovered a systematic relationship between transcript-fold change and alternative splicing as reported by the splicing index. Evidence from dilution experiments and deep sequencing suggest that this effect is of technical rather than biological origin and that it is driven by sequence features of the probes. This effect is substantial and results in a 12-fold overestimation of alternative splicing events in genes that are differentially expressed. By cross-species exon array comparison, we could further show that the systematic bias persists even across species boundaries. Failure to consider this effect in data analysis would result in the reproducible false detection of apparently conserved alternative splicing events. Finally, we have developed a software in R called COSIE (Corrected Splicing Indices for Exon arrays) that for any given set of new exon array experiments corrects for the observed bias and improves the detection of alternative splicing (available at www.fmi.ch/groups/gbioinfo).
Project description:Analyses of EST data show that alternative splicing is much more widespread than once thought. The advent of exon and tiling microarrays means that researchers now have the capacity to experimentally measure alternative splicing on a genome wide level. New methods are needed to analyze the data from these arrays.We present a method, finding isoforms using robust multichip analysis (FIRMA), for detecting differential alternative splicing in exon array data. FIRMA has been developed for Affymetrix exon arrays, but could in principle be extended to other exon arrays, tiling arrays or splice junction arrays. We have evaluated the method using simulated data, and have also applied it to two datasets: a panel of 11 human tissues and a set of 10 pairs of matched normal and tumor colon tissue. FIRMA is able to detect exons in several genes confirmed by reverse transcriptase PCR.R code implementing our methods is contributed to the package aroma.affymetrix.
Project description:This experiment accompanies the main analysis using a custom MHC array to define the first high-resolution, strand-specific transcriptional map of the MHC, defining differences in gene expression for three common haplotypes associated with autoimmune disease. Unstimulated samples for each haplotype were hybridised to Affymetrix Human Exon 1.0 ST arrays as well the custom MHC array. Exon array data were used to assess the concordance of signal obtained from the two platforms and to investigate the extent of alternative splicing in the MHC, and how it compares to the rest of the genome. Lymphoblastoid cell lines carrying three common autoimmunity haplotypes (COX, PGF, QBL) were analysed in triplicate using the Affymetrix Human Exon 1.0 ST Array.
Project description:BACKGROUND: The emergence of isoform-sensitive microarrays has helped fuel in-depth studies of the human transcriptome. The Affymetrix GeneChip Human Exon 1.0 ST Array (Exon Array) has been previously shown to be effective in profiling gene expression at the isoform level. More recently, the Affymetrix GeneChip Human Gene 1.0 ST Array (Gene Array) has been released for measuring gene expression and interestingly contains a large subset of probes from the Exon Array. Here, we explore the potential of using Gene Array probes to assess expression variation at the sub-transcript level. Utilizing datasets of the high quality Microarray Quality Control (MAQC) RNA samples previously assayed on the Exon Array and Gene Array, we compare the expression measurements of the two platforms to determine the performance of the Gene Array in detecting isoform variations. RESULTS: Overall, we show that the Gene Array is comparable to the Exon Array in making gene expression calls. Moreover, to examine expression of different isoforms, we modify the Gene Array probe set definition file to enable summarization of probe intensity values at the exon level and show that the expression profiles between the two platforms are also highly correlated. Next, expression calls of previously known differentially spliced genes were compared and also show concordant results. Splicing index analysis, representing estimates of exon inclusion levels, shows a lower but good correlation between platforms. As the Gene Array contains a significant subset of probes from the Exon Array, we note that, in comparison, the Gene Array overlaps with fewer but still a high proportion of splicing events annotated in the Known Alt Events UCSC track, with abundant coverage of cassette exons. We discuss the ability of the Gene Array to detect alternative splicing and isoform variation and address its limitations. CONCLUSION: The Gene Array is an effective expression profiling tool at gene and exon expression level, the latter made possible by probe set annotation modifications. We demonstrate that the Gene Array is capable of detecting alternative splicing and isoform variation. As expected, in comparison to the Exon Array, it is limited by reduced gene content coverage and is not able to detect as wide a range of alternative splicing events. However, for the events that can be monitored by both platforms, we estimate that the selectivity and sensitivity levels are comparable. We hope our findings will shed light on the potential extension of the Gene Array to detect alternative splicing. It should be particularly suitable for researchers primarily interested in gene expression analysis, but who may be willing to look for splicing and isoform differences within their dataset. However, we do not suggest it to be an equivalent substitute to the more comprehensive Exon Array.
Project description:The use of microarray technology to measure gene expression on a genome-wide scale has been well established for more than a decade. Methods to process and analyse the vast quantity of expression data generated by a typical microarray experiment are similarly well-established. The Affymetrix Exon 1.0 ST array is a relatively new type of array, which has the capability to assess expression at the individual exon level. This allows a more comprehensive analysis of the transcriptome, and in particular enables the study of alternative splicing, a gene regulation mechanism important in both normal conditions and in diseases. Some aspects of exon array data analysis are shared with those for standard gene expression data but others present new challenges that have required development of novel tools. Here, I will introduce the exon array and present a detailed example tutorial for analysis of data generated using this platform.