Unknown

Dataset Information

0

Priors, population sizes, and power in genome-wide hypothesis tests.


ABSTRACT:

Background

Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or alleviated by using prior biological knowledge to favor some hypotheses over others. Here we compare these two methods in terms of their abilities to boost the power of hypothesis testing.

Results

We provide a quantitative estimate for progress in cohort sizes and present a theoretical analysis of the power of oracular hard priors: priors that select a subset of hypotheses for testing, with an oracular guarantee that all true positives are within the tested subset. This theory demonstrates that for GWAS, strong priors that limit testing to 100-1000 genes provide less power than typical annual 20-40% increases in cohort sizes. Furthermore, non-oracular priors that exclude even a small fraction of true positives from the tested set can perform worse than not using a prior at all.

Conclusion

Our results provide a theoretical explanation for the continued dominance of simple, unbiased univariate hypothesis tests for GWAS: if a statistical question can be answered by larger cohort sizes, it should be answered by larger cohort sizes rather than by more complicated biased methods involving priors. We suggest that priors are better suited for non-statistical aspects of biology, such as pathway structure and causality, that are not yet easily captured by standard hypothesis tests.

SUBMITTER: Cai J 

PROVIDER: S-EPMC10134629 | biostudies-literature | 2023 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Priors, population sizes, and power in genome-wide hypothesis tests.

Cai Jitong J   Zhan Jianan J   Arking Dan E DE   Bader Joel S JS  

BMC bioinformatics 20230426 1


<h4>Background</h4>Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or alleviated by using prior biological knowledge to favor some hypotheses over others. Here we compare these two methods in terms of their abilities to boost the power of hypothesis testin  ...[more]

Similar Datasets

2007-04-18 | GSE5961 | GEO
| S-EPMC5125008 | biostudies-literature
| S-EPMC7807926 | biostudies-literature
| S-EPMC3923086 | biostudies-literature
| S-EPMC3252610 | biostudies-literature
| S-EPMC6723621 | biostudies-literature
| S-EPMC5486679 | biostudies-literature
| S-EPMC4930141 | biostudies-literature
| S-EPMC2042984 | biostudies-literature
| S-EPMC2912702 | biostudies-literature