Family-based association test using normal approximation to gene dropping null distribution.
ABSTRACT: We derive the analytical mean and variance of the score test statistic in gene-dropping simulations and approximate the null distribution of the test statistic by a normal distribution. We provide insights into the gene-dropping test by decomposing the test statistic into two components: the first component provides information about linkage, and the second component provides information about fine mapping under the linkage peak. We demonstrate our theoretical findings by applying the gene-dropping test to the simulated data set from Genetic Analysis Workshop 18 and comparing its performance with existing population and family-based association tests.
Project description:We consider likelihood ratio tests (LRT) and their modifications for homogeneity in admixture models. The admixture model is a two-component mixture model, where one component is indexed by an unknown parameter while the parameter value for the other component is known. This model is widely used in genetic linkage analysis under heterogeneity in which the kernel distribution is binomial. For such models, it is long recognized that testing for homogeneity is nonstandard, and the LRT statistic does not converge to a conventional??(2) ?distribution. In this article, we investigate the asymptotic behavior of the LRT for general admixture models and show that its limiting distribution is equivalent to the supremum of a squared Gaussian process. We also discuss the connection and comparison between LRT and alternative approaches such as modifications of LRT and score tests, including the modified LRT (Fu, Chen, and Kalbfleisch, 2006,?Statistica Sinica?16, 805-823). The LRT is an omnibus test that is powerful to detect general alternative hypotheses. In contrast, alternative approaches may be slightly more powerful to detect certain type of alternatives, but much less powerful for others. Our results are illustrated by simulation studies and an application to a genetic linkage study of schizophrenia.
Project description:Masquerading comes at various costs and benefits. The principal benefit being the avoidance of predators. The orb-web spider Cyclosa ginnaga has a silver body and adds a white discoid-shaped silk decoration to its web. The size, shape and colour of C. ginnaga's body resemble, when viewed by the human eye against its decoration, a bird dropping. We therefore hypothesized that their body colouration might combine with its web decoration to form a bird dropping masquerade to protect it from predators. We measured the spectral reflectance of: (i) the spider's body, (ii) the web decoration, and (iii) bird droppings, in the field against a natural background and found that the colour of the spider bodies and decorations were indistinguishable from each other and from bird droppings when viewed by hymentopteran predators. We monitored the predatory attacks on C. ginnaga when the spider's body and/or its decorations were blackened and found that predator attack probabilities were greater when only the decorations were blackened. Accordingly, we concluded that C. ginnaga's decoration and body colouration forms a bird dropping masquerade, which reduces its probability of predation.
Project description:In high-dimensional testing problems ?0, the proportion of null hypotheses that are true is an important parameter. For discrete test statistics, the P values come from a discrete distribution with finite support and the null distribution may depend on an ancillary statistic such as a table margin that varies among the test statistics. Methods for estimating ?0 developed for continuous test statistics, which depend on a uniform or identical null distribution of P values, may not perform well when applied to discrete testing problems.This article introduces a number of ?0 estimators, the regression and 'T' methods that perform well with discrete test statistics and also assesses how well methods developed for or adapted from continuous tests perform with discrete tests. We demonstrate the usefulness of these estimators in the analysis of high-throughput biological RNA-seq and single-nucleotide polymorphism data.implemented in R.
Project description:The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows.With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs.We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation.
Project description:Background. Many aphid species, including the pea aphid Acyrthosiphon pisum, exhibit a behaviour where they drop or fall from their host plant, a commonly used strategy to avoid predation, parasitism or physical disturbance. We hypothesised that there was a physiological non-consumptive cost due to such dropping behaviour because aphids would expend energy re-establishing themselves on a host plant and also lose feeding time. Methods. We evaluated this non-consumptive cost by determining the development time and reproductive potential of pea aphids that whilst developing as nymphs had regularly dropped to the ground following dislodgment from their host plant. Using a microcosm approach, in a replicated and balanced laboratory experiment, we caused aphid dropping behaviour by tapping the plants on which they were feeding. Results. The results demonstrated that disturbance by dropping behaviour increased nymphal development time and reduced their subsequent reproductive capacity as adults. Discussion. We conclude that dropping behaviour had a strong negative effect on the development of nymphs and their subsequent reproductive capacity. This implies that the physiological cost of such a behaviour choice is substantial, and that such avoidance strategies require a trade-off which reduces the capacity of a population to increase.
Project description:The sequence kernel association test (SKAT) is probably the most popular statistical test used in rare-variant association studies. Its null distribution involves unknown parameters that need to be estimated. The current estimation method has a valid type I error rate, but the power is compromised given that all subjects are used for estimation. I have developed an estimation method that uses only control subjects. Named SKAT+, this method uses the same test statistic as SKAT but differs in the way the null distribution is estimated. Extensive simulation studies and applications to data from the Genetic Analysis Workshop 17 and the Ocular Hypertension Treatment Study demonstrated that SKAT+ has superior power over SKAT while maintaining control over the type I error rate. This method is applicable to extensions of SKAT in the literature.
Project description:Background:Sample size calculations are critical to the planning of a clinical trial. For single-arm trials with time-to-event endpoint, standard software provides only limited options. The most popular option is the log-rank test. A second option assuming exponential distribution is available on some online websites. Both these approaches rely on asymptotic normality for the test statistic and perform well for moderate-to-large sample sizes. Methods:As many new treatments in the field of oncology are cost-prohibitive and have slow accrual rates, researchers are often faced with the restriction of conducting single arm trials with potentially small-to-moderate sample sizes. As a practical solution, therefore, we consider the option of performing the sample size calculations using an exact parametric test with the test statistic following a chi-square distribution. Analytic results of sample size calculations from the two methods with Weibull distributed survival times are briefly compared using an example of a clinical trial on cholangiocarcinoma and are verified through simulations. Results:Our simulations suggest that in the case of small sample phase II studies, there can be some practical benefits in using the exact test that could affect the feasibility, timeliness, financial support, and 'clinical novelty' factor in conducting a study. The exact test is a good option for designing small-to-moderate sample trials when accrual and follow-up time are adequate. Conclusions:Based on our simulations for small sample studies, we conclude that a statistician should assess sensitivity of his calculations obtained through different methods before recommending a sample size to their collaborators.