Project description:Background Many microarray experiments search for genes with differential expression between a common “reference” group and multiple test groups, like in the case of time-course designs or of various treatments versus a control condition. In such cases, currently employed statistical approaches based on t-test or close derivatives have limited efficacy, mostly because estimation of noise is done on only two groups at time. Alternative approaches based on ANOVA correctly capture noise from all the groups, but then do not confront single test groups with the reference. We therefore conceived a statistical test for pairwise comparisons between the reference group and each test group that uses within-group variance calculated from all the groups. Results We implemented an R-Bioconductor package named Mulcom, with a statistical test derived from the Dunnett’s test, designed to compare multiple experimental groups against a common reference. In addition to the basic Dunnett’s t value, the package includes an optional minimal fold-change threshold, m. Thanks to automated, permutation-based estimation of False Discovery Rate (FDR), the package also permits fast optimization of the test, to obtain the maximum number of significant genes at a given FDR value. When applied on a time-course experiment profiled in parallel on two microarray platforms, and compared with currently used tests, Mulcom displayed higher concordance of significant genes in the two array platforms, and higher enrichment in functional annotation to categories related to the biology of the experiment. Conclusions The Mulcom package provides a fast and powerful tool for the identification of differentially expressed genes when several experimental conditions are compared with a common reference. We found that Mulcom leads to lists of differentially expressed genes that are particularly consistent across microarray platforms and enriched in significant classes of genes. In our opinion, the main reasons for these good performances are three: (i) within-group variability is estimated from all experimental groups even if only two of them are compared each time; (ii) the optional fold-change threshold m avoids false positives due to aberrantly low within-group variability; (iii) automated test optimization allows maximizing sensitivity without compromising specificity.

Project description:Background Many microarray experiments search for genes with differential expression between a common “reference” group and multiple test groups, like in the case of time-course designs or of various treatments versus a control condition. In such cases, currently employed statistical approaches based on t-test or close derivatives have limited efficacy, mostly because estimation of noise is done on only two groups at time. Alternative approaches based on ANOVA correctly capture noise from all the groups, but then do not confront single test groups with the reference. We therefore conceived a statistical test for pairwise comparisons between the reference group and each test group that uses within-group variance calculated from all the groups. Results We implemented an R-Bioconductor package named Mulcom, with a statistical test derived from the Dunnett’s test, designed to compare multiple experimental groups against a common reference. In addition to the basic Dunnett’s t value, the package includes an optional minimal fold-change threshold, m. Thanks to automated, permutation-based estimation of False Discovery Rate (FDR), the package also permits fast optimization of the test, to obtain the maximum number of significant genes at a given FDR value. When applied on a time-course experiment profiled in parallel on two microarray platforms, and compared with currently used tests, Mulcom displayed higher concordance of significant genes in the two array platforms, and higher enrichment in functional annotation to categories related to the biology of the experiment. Conclusions The Mulcom package provides a fast and powerful tool for the identification of differentially expressed genes when several experimental conditions are compared with a common reference. We found that Mulcom leads to lists of differentially expressed genes that are particularly consistent across microarray platforms and enriched in significant classes of genes. In our opinion, the main reasons for these good performances are three: (i) within-group variability is estimated from all experimental groups even if only two of them are compared each time; (ii) the optional fold-change threshold m avoids false positives due to aberrantly low within-group variability; (iii) automated test optimization allows maximizing sensitivity without compromising specificity. Ten MDA-MB-435 samples, biological duplicates of each condition (untreated, integrin Beta4 treatment, hepatocyte growth factor treatment for 1 hr, 6 hrs, or 24 hrs).

Dataset Information

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets