Estimating RNA-quality using GeneChip microarrays.
ABSTRACT: BACKGROUND: Microarrays are a powerful tool for transcriptome analysis. Best results are obtained using high-quality RNA samples for preparation and hybridization. Issues with RNA integrity can lead to low data quality and failure of the microarray experiment. RESULTS: Microarray intensity data contains information to estimate the RNA quality of the sample. We here study the interplay of the characteristics of RNA surface hybridization with the effects of partly truncated transcripts on probe intensity. The 3'/5' intensity gradient, the basis of microarray RNA quality measures, is shown to depend on the degree of competitive binding of specific and of non-specific targets to a particular probe, on the degree of saturation of the probes with bound transcripts and on the distance of the probe from the 3'-end of the transcript. Increasing degrees of non-specific hybridization or of saturation reduce the 3'/5' intensity gradient and if not taken into account, this leads to biased results in common quality measures for GeneChip arrays such as affyslope or the control probe intensity ratio. We also found that short probe sets near the 3'-end of the transcripts are prone to non-specific hybridization presumable because of inaccurate positional assignment and the existence of transcript isoforms with variable 3' UTRs. Poor RNA quality is associated with a decreased amount of RNA material hybridized on the array paralleled by a decreased total signal level. Additionally, it causes a gene-specific loss of signal due to the positional bias of transcript abundance which requires an individual, gene-specific correction. We propose a new RNA quality measure that considers the hybridization mode. Graphical characteristics are introduced allowing assessment of RNA quality of each single array ('tongs plot' and 'degradation hook'). Furthermore, we suggest a method to correct for effects of RNA degradation on microarray intensities. CONCLUSIONS: The presented RNA degradation measure has best correlation with the independent RNA integrity measure RIN, and therefore presents itself as a valuable tool for quality control and even for the study of RNA degradation. When RNA degradation effects are detected in microarray experiments, a correction of the induced bias in probe intensities is advised.
Project description:BACKGROUND: Microarray experiments rely on several critical steps that may introduce biases and uncertainty in downstream analyses. These steps include mRNA sample extraction, amplification and labelling, hybridization, and scanning causing chip-specific systematic variations on the raw intensity level. Also the chosen array-type and the up-to-dateness of the genomic information probed on the chip affect the quality of the expression measures. In the accompanying publication we presented theory and algorithm of the so-called hook method which aims at correcting expression data for systematic biases using a series of new chip characteristics. RESULTS: In this publication we summarize the essential chip characteristics provided by this method, analyze special benchmark experiments to estimate transcript related expression measures and illustrate the potency of the method to detect and to quantify the quality of a particular hybridization. It is shown that our single-chip approach provides expression measures responding linearly on changes of the transcript concentration over three orders of magnitude. In addition, the method calculates a detection call judging the relation between the signal and the detection limit of the particular measurement. The performance of the method in the context of different chip generations and probe set assignments is illustrated. The hook method characterizes the RNA-quality in terms of the 3'/5'-amplification bias and the sample-specific calling rate. We show that the proper judgement of these effects requires the disentanglement of non-specific and specific hybridization which, otherwise, can lead to misinterpretations of expression changes. The consequences of modifying probe/target interactions by either changing the labelling protocol or by substituting RNA by DNA targets are demonstrated. CONCLUSION: The single-chip based hook-method provides accurate expression estimates and chip-summary characteristics using the natural metrics given by the hybridization reaction with the potency to develop new standards for microarray quality control and calibration.
Project description:Alternative mRNA processing mechanisms lead to multiple transcripts (i.e. splice isoforms) of a given gene which may have distinct biological functions. Microarrays like Affymetrix GeneChips measure mRNA expression of genes using sets of nucleotide probes. Until recently probe sets were not designed for transcript specificity. Nevertheless, the re-analysis of established microarray data using newly defined transcript-specific probe sets may provide information about expression levels of specific transcripts.In the present study alignment of probe sequences of the Affymetrix microarray HG-U133A with Ensembl transcript sequences was performed to define transcript-specific probe sets. Out of a total of 247,965 perfect match probes, 95,008 were designated "transcript-specific", i.e. showing complete sequence alignment, no cross-hybridization, and transcript-, not only gene-specificity. These probes were grouped into 7,941 transcript-specific probe sets and 15,619 gene-specific probe sets, respectively. The former were used to differentiate 445 alternative transcripts of 215 genes. For selected transcripts, predicted by this analysis to be differentially expressed in the human kidney, confirmatory real-time RT-PCR experiments were performed. First, the expression of two specific transcripts of the genes PPM1A (PP2CA_HUMAN and P35813) and PLG (PLMN_HUMAN and Q5TEH5) in human kidneys was determined by the transcript-specific array analysis and confirmed by real-time RT-PCR. Secondly, disease-specific differential expression of single transcripts of PLG and ABCA1 (ABCA1_HUMAN and Q5VYS0_HUMAN) was computed from the available array data sets and confirmed by transcript-specific real-time RT-PCR.Transcript-specific analysis of microarray experiments can be employed to study gene-regulation on the transcript level using conventional microarray data. In this study, predictions based on sufficient probe set size and fold-change are confirmed by independent means.
Project description:MOTIVATION: Gene expression experiments aim to accurately quantify thousands of transcripts in parallel. Factors posterior to RNA extraction can, however, impair their accurate representation. RNA degradation and differences in the efficiency of amplification affect raw intensity measurements using Affymetrix expression arrays. The positional intensity decay of specifically hybridized probes along the transcript they intend to interrogate is used to estimate the RNA quality in a sample and to correct probe intensities for the degradation bias. This functionality, for which no previous software solution is available, is implemented in the R/Bioconductor package AffyRNADegradation presented here. AVAILABILITY: The package is available via Bioconductor at the URL http://bioconductor.org/packages/release/bioc/html/AffyRNA Degradation.html
Project description:BACKGROUND: RNA and microarray quality assessment form an integral part of gene expression analysis and, although methods such as the RNA integrity number (RIN) algorithm reliably asses RNA integrity, the relevance of RNA integrity in gene expression analysis as well as analysis methods to accommodate the possible effects of degradation requires further investigation. We investigated the relationship between RNA integrity and array quality on the commonly used Affymetrix Gene 1.0 ST array platform using reliable within-array and between-array quality assessment measures. The possibility of a transcript specific bias in the apparent effect of RNA degradation on the measured gene expression signal was evaluated after either excluding quality-flagged arrays or compensation for RNA degradation at different steps in the analysis. RESULTS: Using probe-level and inter-array quality metrics to assess 34 Gene 1.0 ST array datasets derived from historical, paired tumour and normal primary colorectal cancer samples, 7 arrays (20.6%), with a mean sample RIN of 3.2 (SD = 0.42), were flagged during array quality assessment while 10 arrays from samples with RINs < 7 passed quality assessment, including one sample with a RIN < 3. We detected a transcript length bias in RNA degradation in only 5.8% of annotated transcript clusters (p-value 0.05, FC ? |2|), with longer and shorter than average transcripts under- and overrepresented in quality-flagged samples respectively. Applying compensatory measures for RNA degradation performed at least as well as excluding quality-flagged arrays, as judged by hierarchical clustering, gene expression analysis and Ingenuity Pathway Analysis; importantly, use of these compensatory measures had the significant benefit of enabling lower quality array data from irreplaceable clinical samples to be retained in downstream analyses. CONCLUSIONS: Here, we demonstrate an effective array-quality assessment strategy, which will allow the user to recognize lower quality arrays that can be included in the analysis once appropriate measures are applied to account for known or unknown sources of variation, such as array quality- and batch- effects, by implementing ComBat or Surrogate Variable Analysis. This approach of quality control and analysis will be especially useful for clinical samples with variable and low RNA qualities, with RIN scores ? 2.
Project description:<h4>Background</h4>Post-hybridization washing is an essential part of microarray experiments. Both the quality of the experimental washing protocol and adequate consideration of washing in intensity calibration ultimately affect the quality of the expression estimates extracted from the microarray intensities.<h4>Results</h4>We conducted experiments on GeneChip microarrays with altered protocols for washing, scanning and staining to study the probe-level intensity changes as a function of the number of washing cycles. For calibration and analysis of the intensity data we make use of the 'hook' method which allows intensity contributions due to non-specific and specific hybridization of perfect match (PM) and mismatch (MM) probes to be disentangled in a sequence specific manner. On average, washing according to the standard protocol removes about 90% of the non-specific background and about 30-50% and less than 10% of the specific targets from the MM and PM, respectively. Analysis of the washing kinetics shows that the signal-to-noise ratio doubles roughly every ten stringent washing cycles. Washing can be characterized by time-dependent rate constants which reflect the heterogeneous character of target binding to microarray probes. We propose an empirical washing function which estimates the survival of probe bound targets. It depends on the intensity contribution due to specific and non-specific hybridization per probe which can be estimated for each probe using existing methods. The washing function allows probe intensities to be calibrated for the effect of washing. On a relative scale, proper calibration for washing markedly increases expression measures, especially in the limit of small and large values.<h4>Conclusions</h4>Washing is among the factors which potentially distort expression measures. The proposed first-order correction method allows direct implementation in existing calibration algorithms for microarray data. We provide an experimental 'washing data set' which might be used by the community for developing amendments of the washing correction.
Project description:BACKGROUND: An algorithm for the analysis of Affymetrix Genechips is presented. This algorithm, referred to as the Inverse Langmuir Method (ILM), estimates the binding of transcripts to complementary probes using DNA/RNA hybridization free energies, and the hybridization between partially complementary transcripts in solution using RNA/RNA free energies. The balance between these two competing reactions allows for the translation of background-subtracted intensities into transcript concentrations. RESULTS: To validate the ILM, it is applied to publicly available microarray data from a multi-lab comparison study. Here, microarray experiments are performed on samples which deviate only in few genes. The log2 fold change between these two samples, as obtained from RT-PCR experiments, agrees well with the log2 fold change as obtained with the ILM, indicating that the ILM determines changes in the expression level accurately. We also show that the ILM allows for the identification of outlying probes, as it yields independent concentration estimates per probe. CONCLUSION: The ILM is robust and offers an interesting alternative to purely statistical algorithms for microarray data analysis.
Project description:To examine the utility and performance of 50mer oligonucleotide (oligonucleotide probe) microarrays, gene-specific oligonucleotide probes were spotted along with PCR probes onto glass microarrays and the performance of each probe type was evaluated. The specificity of oligonucleotide probes was studied using target RNAs that shared various degrees of sequence similarity. Sensitivity was defined as the ability to detect a 3-fold change in mRNA. No significant difference in sensitivity between oligonucleotide probes and PCR probes was observed and both had a minimum reproducible detection limit of approximately 10 mRNA copies/cell. Specificity studies showed that for a given oligonucleotide probe any 'non-target' transcripts (cDNAs) >75% similar over the 50 base target may show cross-hybridization. Thus non-target sequences which have >75-80% sequence similarity with target sequences (within the oligonucleotide probe 50 base target region) will contribute to the overall signal intensity. In addition, if the 50 base target region is marginally similar, it must not include a stretch of complementary sequence >15 contiguous bases. Therefore, knowledge about the target sequence, as well as its similarity to other mRNAs in the target tissue or RNA sample, is required to design successful oligonucleotide probes for quality microarray results. Together these results validate the utility of oligonucleotide probe (50mer) glass microarrays.
Project description:BACKGROUND: Natural antisense transcripts (NATs) are transcripts of the opposite DNA strand to the sense-strand either at the same locus (cis-encoded) or a different locus (trans-encoded). They can affect gene expression at multiple stages including transcription, RNA processing and transport, and translation. NATs give rise to sense-antisense transcript pairs and the number of these identified has escalated greatly with the availability of DNA sequencing resources and public databases. Traditionally, NATs were identified by the alignment of full-length cDNAs or expressed sequence tags to genome sequences, but an alternative method for large-scale detection of sense-antisense transcript pairs involves the use of microarrays. In this study we developed a novel protocol to assay sense- and antisense-strand transcription on the 55 K Affymetrix GeneChip Wheat Genome Array, which is a 3' in vitro transcription (3'IVT) expression array. We selected five different tissue types for assay to enable maximum discovery, and used the 'Chinese Spring' wheat genotype because most of the wheat GeneChip probe sequences were based on its genomic sequence. This study is the first report of using a 3'IVT expression array to discover the expression of natural sense-antisense transcript pairs, and may be considered as proof-of-concept. RESULTS: By using alternative target preparation schemes, both the sense- and antisense-strand derived transcripts were labeled and hybridized to the Wheat GeneChip. Quality assurance verified that successful hybridization did occur in the antisense-strand assay. A stringent threshold for positive hybridization was applied, which resulted in the identification of 110 sense-antisense transcript pairs, as well as 80 potentially antisense-specific transcripts. Strand-specific RT-PCR validated the microarray observations, and showed that antisense transcription is likely to be tissue specific. For the annotated sense-antisense transcript pairs, analysis of the gene ontology terms showed a significant over-representation of transcripts involved in energy production. These included several representations of ATP synthase, photosystem proteins and RUBISCO, which indicated that photosynthesis is likely to be regulated by antisense transcripts. CONCLUSION: This study demonstrated the novel use of an adapted labeling protocol and a 3'IVT GeneChip array for large-scale identification of antisense transcription in wheat. The results show that antisense transcription is relatively abundant in wheat, and may affect the expression of valuable agronomic phenotypes. Future work should select potentially interesting transcript pairs for further functional characterization to determine biological activity.
Project description:The rice OneArray® 60-mer, oligonucleotide microarray consists of a total of 21,179 probes covering 20,806 genes of japonica and 13,683 genes of indica. Through a validation study, total RNA isolated from rice shoots and roots were used for comparison of gene expression profiles via microarray examination. A list of significantly differentially expressed genes was generated; 438 shoot-specific genes were identified among 3,138 up-regulated genes, and 463 root-specific genes were found among 3,845 down-regulated genes. The Phalanx microarray platform is based upon the hybridization of a single labeled sample (derived from RNA), followed by one-channel detection. The intensity of the hybridization signal is used to determine target concentration. In order to validate the technical quality of each probe in our arrays, we carried out 10 independent hybridizations on samples representing two different rice tissues – root and shoot. To examine the gene expression profiles between rice root and shoot development, total RNA extracted from rice root and shoot were processed on the rice OneArray® microarray following the standard protocol using five arrays for each tissue type. Raw expression data from 10 microarrays (e.g. 5 arrays × 2 tissues) were normalized and Pearson’s correlation coefficients were calculated for the data sets of hybridization signal intensities.
Project description:<h4>Motivation</h4>Microarray designs have become increasingly probe-rich, enabling targeting of specific features, such as individual exons or single nucleotide polymorphisms. These arrays have the potential to achieve quantitative high-throughput estimates of transcript abundances, but currently these estimates are affected by biases due to cross-hybridization, in which probes hybridize to off-target transcripts.<h4>Results</h4>To study cross-hybridization, we map Affymetrix exon array probes to a set of annotated mRNA transcripts, allowing a small number of mismatches or insertion/deletions between the two sequences. Based on a systematic study of the degree to which probes with a given match type to a transcript are affected by cross-hybridization, we developed a strategy to correct for cross-hybridization biases of gene-level expression estimates. Comparison with Solexa ultra high-throughput sequencing data demonstrates that correction for cross-hybridization leads to a significant improvement of gene expression estimates.<h4>Availability</h4>We provide mappings between human and mouse exon array probes and off-target transcripts and provide software extending the GeneBASE program for generating gene-level expression estimates including the cross-hybridization correction http://biogibbs.stanford.edu/~kkapur/GeneBase/.