Unknown

Dataset Information

0

Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets.


ABSTRACT: Many researchers try to understand a biological condition by identifying biomarkers. This is typically done using univariate hypothesis testing over a labeled dataset, declaring a feature to be a biomarker if there is a significant statistical difference between its values for the subjects with different outcomes. However, such sets of proposed biomarkers are often not reproducible - subsequent studies often fail to identify the same sets. Indeed, there is often only a very small overlap between the biomarkers proposed in pairs of related studies that explore the same phenotypes over the same distribution of subjects. This paper first defines the Reproducibility Score for a labeled dataset as a measure (taking values between 0 and 1) of the reproducibility of the results produced by a specified fixed biomarker discovery process for a given distribution of subjects. We then provide ways to reliably estimate this score by defining algorithms that produce an over-bound and an under-bound for this score for a given dataset and biomarker discovery process, for the case of univariate hypothesis testing on dichotomous groups. We confirm that these approximations are meaningful by providing empirical results on a large number of datasets and show that these predictions match known reproducibility results. To encourage others to apply this technique to analyze their biomarker sets, we have also created a publicly available website, https://biomarker.shinyapps.io/BiomarkerReprod/, that produces these Reproducibility Score approximations for any given dataset (with continuous or discrete features and binary class labels).

SUBMITTER: Forouzandeh A 

PROVIDER: S-EPMC9333302 | biostudies-literature | 2022

REPOSITORIES: biostudies-literature

altmetric image

Publications

Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets.

Forouzandeh Amir A   Rutar Alex A   Kalmady Sunil V SV   Greiner Russell R  

PloS one 20220728 7


Many researchers try to understand a biological condition by identifying biomarkers. This is typically done using univariate hypothesis testing over a labeled dataset, declaring a feature to be a biomarker if there is a significant statistical difference between its values for the subjects with different outcomes. However, such sets of proposed biomarkers are often not reproducible - subsequent studies often fail to identify the same sets. Indeed, there is often only a very small overlap between  ...[more]

Similar Datasets

| S-EPMC3179615 | biostudies-literature
| S-EPMC10789030 | biostudies-literature
| S-EPMC10216598 | biostudies-literature
| S-EPMC10913052 | biostudies-literature
| S-EPMC8791585 | biostudies-literature
| S-EPMC5447240 | biostudies-literature
2022-01-12 | GSE164833 | GEO
| S-EPMC3218848 | biostudies-other
| S-EPMC7487249 | biostudies-literature
2019-05-31 | E-MTAB-7983 | biostudies-arrayexpress