Test selection with application to detecting disease association with multiple SNPs.
ABSTRACT: We consider the motivating problem of testing for association between a phenotype and multiple single nucleotide polymorphisms (SNPs) within a candidate gene or region. Various statistical approaches have been proposed, including those based on either (combining univariate) single-locus analyses or (multivariate) multilocus analyses. However, it is known in theory that there is no single uniformly most powerful test to detect association with multiple SNPs. On the other hand, several tests have been shown to be among frequent winners across a range of practical situations, but the identity of the most powerful one changes with the situation in an unknown way. Here we propose a novel test selection procedure to select from five such tests: a so-called UminP test that combines multiple univariate/single-locus score tests by taking the minimum of their p values as its test statistic, a multivariate score test and its two modifications, and a so-called sum test. We also illustrate its application to selecting genotype codings for the sum test since the performance of the sum test depends on its genotype coding in an unknown way. Our major contributions include the methodology of estimating the power of a given test with a given dataset and the idea of using the estimated power as the criterion for test selection. We also propose a fast simulation-based method to calculate p values for the test selection procedure and for any method of combining p values. Our numerical results indicated that the proposed test selection procedure always yielded power close to the most powerful test among the candidate tests at any given situation, and in particular, our proposed test selection performed either better than or as well as the popular combining method of taking the minimum p value of the candidate tests.
Project description:This article focuses on conducting global testing for association between a binary trait and a set of rare variants (RVs), although its application can be much broader to other types of traits, common variants (CVs), and gene set or pathway analysis. We show that many of the existing tests have deteriorating performance in the presence of many nonassociated RVs: their power can dramatically drop as the proportion of nonassociated RVs in the group to be tested increases. We propose a class of so-called sum of powered score (SPU) tests, each of which is based on the score vector from a general regression model and hence can deal with different types of traits and adjust for covariates, e.g., principal components accounting for population stratification. The SPU tests generalize the sum test, a representative burden test based on pooling or collapsing genotypes of RVs, and a sum of squared score (SSU) test that is closely related to several other powerful variance component tests; a previous study (Basu and Pan 2011) has demonstrated good performance of one, but not both, of the Sum and SSU tests in many situations. The SPU tests are versatile in the sense that one of them is often powerful, although its identity varies with the unknown true association parameters. We propose an adaptive SPU (aSPU) test to approximate the most powerful SPU test for a given scenario, consequently maintaining high power and being highly adaptive across various scenarios. We conducted extensive simulations to show superior performance of the aSPU test over several state-of-the-art association tests in the presence of many nonassociated RVs. Finally we applied the SPU and aSPU tests to the GAW17 mini-exome sequence data to compare its practical performance with some existing tests, demonstrating their potential usefulness.
Project description:Next generation sequencing technologies make direct testing rare variant associations possible. However, the development of powerful statistical methods for rare variant association studies is still underway. Most of existing methods are burden and quadratic tests. Recent studies show that the performance of each of burden and quadratic tests depends strongly upon the underlying assumption and no test demonstrates consistently acceptable power. Thus, combined tests by combining information from the burden and quadratic tests have been proposed recently. However, results from recent studies (including this study) show that there exist tests that can outperform both burden and quadratic tests. In this article, we propose three classes of tests that include tests outperforming both burden and quadratic tests. Then, we propose the optimal combination of single-variant tests (OCST) by combining information from tests of the three classes. We use extensive simulation studies to compare the performance of OCST with that of burden, quadratic and optimal single-variant tests. Our results show that OCST either is the most powerful test or has similar power with the most powerful test. We also compare the performance of OCST with that of the two existing combined tests. Our results show that OCST has better power than the two combined tests.
Project description:Recently, many statistical methods have been proposed to test for associations between rare genetic variants and complex traits. Most of these methods test for association by aggregating genetic variations within a predefined region, such as a gene. Although there is evidence that "aggregate" tests are more powerful than the single marker test, these tests generally ignore neutral variants and therefore are unable to identify specific variants driving the association with phenotype. We propose a novel aggregate rare-variant test that explicitly models a fraction of variants as neutral, tests associations at the gene-level, and infers the rare-variants driving the association. Simulations show that in the practical scenario where there are many variants within a given region of the genome with only a fraction causal our approach has greater power compared to other popular tests such as the Sequence Kernel Association Test (SKAT), the Weighted Sum Statistic (WSS), and the collapsing method of Morris and Zeggini (MZ). Our algorithm leverages a fast variational Bayes approximate inference methodology to scale to exome-wide analyses, a significant computational advantage over exact inference model selection methodologies. To demonstrate the efficacy of our methodology we test for associations between von Willebrand Factor (VWF) levels and VWF missense rare-variants imputed from the National Heart, Lung, and Blood Institute's Exome Sequencing project into 2,487 African Americans within the VWF gene. Our method suggests that a relatively small fraction (~10%) of the imputed rare missense variants within VWF are strongly associated with lower VWF levels in African Americans.
Project description:For comparison of multiple outcomes commonly encountered in biomedical research, Huang et al. (2005) improved O'Brien's (1984) rank-sum tests through the replacement of the ad hoc variance by the asymptotic variance of the test statistics. The improved tests control the Type I error rate at the desired level and gain