Unknown

Dataset Information

0

Size matters: how sample size affects the reproducibility and specificity of gene set analysis.


ABSTRACT: BACKGROUND:Gene set analysis is a well-established approach for interpretation of data from high-throughput gene expression studies. Achieving reproducible results is an essential requirement in such studies. One factor of a gene expression experiment that can affect reproducibility is the choice of sample size. However, choosing an appropriate sample size can be difficult, especially because the choice may be method-dependent. Further, sample size choice can have unexpected effects on specificity. RESULTS:In this paper, we report on a systematic, quantitative approach to study the effect of sample size on the reproducibility of the results from 13 gene set analysis methods. We also investigate the impact of sample size on the specificity of these methods. Rather than relying on synthetic data, the proposed approach uses real expression datasets to offer an accurate and reliable evaluation. CONCLUSION:Our findings show that, as a general pattern, the results of gene set analysis become more reproducible as sample size increases. However, the extent of reproducibility and the rate at which it increases vary from method to method. In addition, even in the absence of differential expression, some gene set analysis methods report a large number of false positives, and increasing sample size does not lead to reducing these false positives. The results of this research can be used when selecting a gene set analysis method from those available.

SUBMITTER: Maleki F 

PROVIDER: S-EPMC6805317 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Size matters: how sample size affects the reproducibility and specificity of gene set analysis.

Maleki Farhad F   Ovens Katie K   McQuillan Ian I   Kusalik Anthony J AJ  

Human genomics 20191022 Suppl 1


<h4>Background</h4>Gene set analysis is a well-established approach for interpretation of data from high-throughput gene expression studies. Achieving reproducible results is an essential requirement in such studies. One factor of a gene expression experiment that can affect reproducibility is the choice of sample size. However, choosing an appropriate sample size can be difficult, especially because the choice may be method-dependent. Further, sample size choice can have unexpected effects on s  ...[more]

Similar Datasets

| S-EPMC4495301 | biostudies-literature
| S-EPMC3494661 | biostudies-literature
| S-EPMC4814515 | biostudies-literature
| S-EPMC6064922 | biostudies-literature
| S-EPMC7905317 | biostudies-literature
| S-EPMC3058765 | biostudies-other
| S-EPMC5513924 | biostudies-other
| S-EPMC4029033 | biostudies-literature
| S-EPMC6692785 | biostudies-literature