Calculating Power by Bootstrap, with an Application to Cluster-Randomized Trials.
ABSTRACT: A key requirement for a useful power calculation is that the calculation mimic the data analysis that will be performed on the actual data, once that data is observed. Close approximations may be difficult to achieve using analytic solutions, however, and thus Monte Carlo approaches, including both simulation and bootstrap resampling, are often attractive. One setting in which this is particularly true is cluster-randomized trial designs. However, Monte Carlo approaches are useful in many additional settings as well. Calculating power for cluster-randomized trials using analytic or simulation-based methods is frequently unsatisfactory due to the complexity of the data analysis methods to be employed and to the sparseness of data to inform the choice of important parameters in these methods.We propose that among Monte Carlo methods, bootstrap approaches are most likely to generate data similar to the observed data. In bootstrap approaches, real data are resampled to build complete data sets based on real data that resemble the data for the intended analyses. In contrast, simulation methods would use the real data to estimate parameters for the data, and would then simulate data using these parameters. We describe means of implementing bootstrap power calculation.We demonstrate bootstrap power calculation for a cluster-randomized trial with a censored survival outcome and a baseline observation period.Bootstrap power calculation is a natural application of resampling methods. It provides a relatively simple solution to power calculation that is likely to be more accurate than analytic solutions or simulation-based calculations, in the sense that the bootstrap approach does not rely on the assumptions inherent in analytic calculations. This method of calculation has several important strengths. Notably, it is simple to achieve great fidelity to the proposed data analysis method and there is no requirement for parameter estimates, or estimates of their variability, from outside settings. So, for example, for cluster-randomized trials, power calculations need not depend on intracluster correlation coefficient estimates from outside studies. In contrast, bootstrap power calculation requires initial data that resemble data that are to be used in the planned study. We are not aware of bootstrap power calculation being previously proposed or explored for cluster-randomized trials, but it can also be applied for other study designs. We show with a simulation study that bootstrap power calculation can replicate analytic power in cases where analytic power can be accurately calculated. We also demonstrate power calculations for correlated censored survival outcomes in a cluster-randomized trial setting, for which we are unaware of an analytic alternative. The method can easily be used when preliminary data are available, as is likely to be the case when research is performed in health delivery systems or other settings where electronic medical records can be obtained.
Project description:Statistical power calculations constitute an essential first step in the planning of scientific studies. If sufficient summary statistics are available, power calculations are in principle straightforward and computationally light. In designs, which comprise distinct groups (e.g., MZ & DZ twins), sufficient statistics can be calculated within each group, and analyzed in a multi-group model. However, when the number of possible groups is prohibitively large (say, in the hundreds), power calculations on the basis of the summary statistics become impractical. In that case, researchers may resort to Monte Carlo based power studies, which involve the simulation of hundreds or thousands of replicate samples for each specified set of population parameters. Here we present exact data simulation as a third method of power calculation. Exact data simulation involves a transformation of raw data so that the data fit the hypothesized model exactly. As in power calculation with summary statistics, exact data simulation is computationally light, while the number of groups in the analysis has little bearing on the practicality of the method. The method is applied to three genetic designs for illustrative purposes.
Project description:As more population-based studies suggest associations between genetic variants and disease risk, there is a need to improve the design of follow-up studies (stage II) in independent samples to confirm evidence of association observed at the initial stage (stage I). We propose to use flexible designs developed for randomized clinical trials in the calculation of sample size for follow-up studies. We apply a bootstrap procedure to correct the effect of regression to the mean, also called "winner's curse," resulting from choosing to follow up the markers with the strongest associations. We show how the results from stage I can improve sample size calculations for stage II adaptively. Despite the adaptive use of stage I data, the proposed method maintains the nominal global type I error for final analyses on the basis of either pure replication with the stage II data only or a joint analysis using information from both stages. Simulation studies show that sample-size calculations accounting for the impact of regression to the mean with the bootstrap procedure are more appropriate than is the conventional method. We also find that, in the context of flexible design, the joint analysis is generally more powerful than the replication analysis.
Project description:Complex mediation models, such as a two-mediator sequential model, have become more prevalent in the literature. To test an indirect effect in a two-mediator model, we conducted a large-scale Monte Carlo simulation study of the Type I error, statistical power, and confidence interval coverage rates of 10 frequentist and Bayesian confidence/credible intervals (CIs) for normally and nonnormally distributed data. The simulation included never-studied methods and conditions (e.g., Bayesian CI with flat and weakly informative prior methods, two model-based bootstrap methods, and two nonnormality conditions) as well as understudied methods (e.g., profile-likelihood, Monte Carlo with maximum likelihood standard error [MC-ML] and robust standard error [MC-Robust]). The popular BC bootstrap showed inflated Type I error rates and CI under-coverage. We recommend different methods depending on the purpose of the analysis. For testing the null hypothesis of no mediation, we recommend MC-ML, profile-likelihood, and two Bayesian methods. To report a CI, if data has a multivariate normal distribution, we recommend MC-ML, profile-likelihood, and the two Bayesian methods; otherwise, for multivariate nonnormal data we recommend the percentile bootstrap. We argue that the best method for testing hypotheses is not necessarily the best method for CI construction, which is consistent with the findings we present.
Project description:Sample sizes for randomized controlled trials are typically based on power calculations. They require us to specify values for parameters such as the treatment effect, which is often difficult because we lack sufficient prior information. The objective of this paper is to provide an alternative design which circumvents the need for sample size calculation. In a simulation study, we compared a meta-experiment approach to the classical approach to assess treatment efficacy. The meta-experiment approach involves use of meta-analyzed results from 3 randomized trials of fixed sample size, 100 subjects. The classical approach involves a single randomized trial with the sample size calculated on the basis of an a priori-formulated hypothesis. For the sample size calculation in the classical approach, we used observed articles to characterize errors made on the formulated hypothesis. A prospective meta-analysis of data from trials of fixed sample size provided the same precision, power and type I error rate, on average, as the classical approach. The meta-experiment approach may provide an alternative design which does not require a sample size calculation and addresses the essential need for study replication; results may have greater external validity.
Project description:In behavioral, educational and medical practice, interventions are often personalized over time using strategies that are based on individual behaviors and characteristics and changes in symptoms, severity, or adherence that are a result of one’s treatment. Such strategies that more closely mimic real practice, are known as dynamic treatment regimens (DTRs). A sequential multiple assignment randomized trial (SMART) is a multi-stage trial design that can be used to construct effective DTRs. This article reviews a simple to use ‘weighted and replicated’ estimation technique for comparing DTRs embedded in a SMART design using logistic regression for a binary, end-of-study outcome variable. Based on a Wald test that compares two embedded DTRs of interest from the ‘weighted and replicated’ regression model, a sample size calculation is presented with a corresponding user-friendly applet to aid in the process of designing a SMART. The analytic models and sample size calculations are presented for three of the more commonly used two-stage SMART designs. Simulations for the sample size calculation show the empirical power reaches expected levels. A data analysis example with corresponding code is presented in the appendix using data from a SMART developing an effective DTR in autism.
Project description:In recent years, the number of studies using a cluster-randomized design has grown dramatically. In addition, the cluster-randomized crossover design has been touted as a methodological advance that can increase efficiency of cluster-randomized studies in certain situations. While the cluster-randomized crossover trial has become a popular tool, standards of design, analysis, reporting and implementation have not been established for this emergent design. We address one particular aspect of cluster-randomized and cluster-randomized crossover trial design: estimating statistical power. We present a general framework for estimating power via simulation in cluster-randomized studies with or without one or more crossover periods. We have implemented this framework in the clusterPower software package for R, freely available online from the Comprehensive R Archive Network. Our simulation framework is easy to implement and users may customize the methods used for data analysis. We give four examples of using the software in practice. The clusterPower package could play an important role in the design of future cluster-randomized and cluster-randomized crossover studies. This work is the first to establish a universal method for calculating power for both cluster-randomized and cluster-randomized clinical trials. More research is needed to develop standardized and recommended methodology for cluster-randomized crossover studies.
Project description:Segmentation algorithms are typically evaluated by comparison to an accepted reference standard. The cost of generating accurate reference standards for medical image segmentation can be substantial. Since the study cost and the likelihood of detecting a clinically meaningful difference in accuracy both depend on the size and on the quality of the study reference standard, balancing these trade-offs supports the efficient use of research resources. In this work, we derive a statistical power calculation that enables researchers to estimate the appropriate sample size to detect clinically meaningful differences in segmentation accuracy (i.e. the proportion of voxels matching the reference standard) between two algorithms. Furthermore, we derive a formula to relate reference standard errors to their effect on the sample sizes of studies using lower-quality (but potentially more affordable and practically available) reference standards. The accuracy of the derived sample size formula was estimated through Monte Carlo simulation, demonstrating, with 95% confidence, a predicted statistical power within 4% of simulated values across a range of model parameters. This corresponds to sample size errors of less than 4 subjects and errors in the detectable accuracy difference less than 0.6%. The applicability of the formula to real-world data was assessed using bootstrap resampling simulations for pairs of algorithms from the PROMISE12 prostate MR segmentation challenge data set. The model predicted the simulated power for the majority of algorithm pairs within 4% for simulated experiments using a high-quality reference standard and within 6% for simulated experiments using a low-quality reference standard. A case study, also based on the PROMISE12 data, illustrates using the formulae to evaluate whether to use a lower-quality reference standard in a prostate segmentation study.
Project description:This study compared a small bone joint dosimetry calculated by the anisotropic analytical algorithm (AAA) and Monte Carlo simulation using megavoltage (MV) photon beams. The performance of the AAA in the joint dose calculation was evaluated using Monte Carlo simulation, and dependences of joint dose on its width and beam angle were investigated. Small bone joint phantoms containing a vertical water layer (0.5-2 mm) sandwiched by two bones (2 × 2 × 2 cm3) were irradiated by the 6 and 15 MV photon beams with field size equal to 4 × 4 cm2. Depth doses along the central beam axis in a joint (cartilage) were calculated with and without a bolus (thickness = 1.5 cm) added on top of the phantoms. Different beam angles (0°-15°) were used with the isocenter set to the center of the bone joint for dose calculations using the AAA (Eclipse treatment planning system) and Monte Carlo simulation (the EGSnrc code). For dosimetry comparison and normalization, dose calculations were repeated in homogeneous water phantoms with the bone substituted by water. Comparing the calculated dosimetry between the AAA and Monte Carlo simulation, the AAA underestimated joint doses varying with its widths by about 6%-12% for 6 MV and 12%-23% for 15 MV without bolus, and by 7% for 6 MV and 13%-17% for 15 MV with bolus. Moreover, joint doses calculated by the AAA did not vary with the joint width and beam angle. From Monte Carlo results, there was a decrease in the calculated joint dose as the joint width increased, and a slight decrease as the beam angle increased. When bolus was added to the phantom, it was found that variations of joint dose with its width and beam angle became less significant for the 6 MV photon beams. In conclusion, dosimetry deviation in small bone joint calculated by the AAA and Monte Carlo simulation was studied using the 6 and 15 MV photon beam. The AAA could not predict variations of joint dose with its width and beam angle, which were predicted by the Monte Carlo simulations.
Project description:It has long been recognized that sample size calculations for cluster randomized trials require consideration of the correlation between multiple observations within the same cluster. When measurements are taken at anything other than a single point in time, these correlations depend not only on the cluster but also on the time separation between measurements and additionally, on whether different participants (cross-sectional designs) or the same participants (cohort designs) are repeatedly measured. This is particularly relevant in trials with multiple periods of measurement, such as the cluster cross-over and stepped-wedge designs, but also to some degree in parallel designs. Several papers describing sample size methodology for these designs have been published, but this methodology might not be accessible to all researchers. In this article we provide a tutorial on sample size calculation for cluster randomized designs with particular emphasis on designs with multiple periods of measurement and provide a web-based tool, the Shiny CRT Calculator, to allow researchers to easily conduct these sample size calculations. We consider both cross-sectional and cohort designs and allow for a variety of assumed within-cluster correlation structures. We consider cluster heterogeneity in treatment effects (for designs where treatment is crossed with cluster), as well as individually randomized group-treatment trials with differential clustering between arms, for example designs where clustering arises from interventions being delivered in groups. The calculator will compute power or precision, as a function of cluster size or number of clusters, for a wide variety of designs and correlation structures. We illustrate the methodology and the flexibility of the Shiny CRT Calculator using a range of examples.
Project description:BACKGROUND:Log-linear and multinomial modeling offer a flexible framework for genetic association analyses of offspring (child), parent-of-origin and maternal effects, based on genotype data from a variety of child-parent configurations. Although the calculation of statistical power or sample size is an important first step in the planning of any scientific study, there is currently a lack of software for genetic power calculations in family-based study designs. Here, we address this shortcoming through new implementations of power calculations in the R package Haplin, which is a flexible and robust software for genetic epidemiological analyses. Power calculations in Haplin can be performed analytically using the asymptotic variance-covariance structure of the parameter estimator, or else by a straightforward simulation approach. Haplin performs power calculations for child, parent-of-origin and maternal effects, as well as for gene-environment interactions. The power can be calculated for both single SNPs and haplotypes, either autosomal or X-linked. Moreover, Haplin enables power calculations for different child-parent configurations, including (but not limited to) case-parent triads, case-mother dyads, and case-parent triads in combination with unrelated control-parent triads. RESULTS:We compared the asymptotic power approximations to the power of analysis attained with Haplin. For external validation, the results were further compared to the power of analysis attained by the EMIM software using data simulations from Haplin. Consistency observed between Haplin and EMIM across various genetic scenarios confirms the computational accuracy of the inference methods used in both programs. The results also demonstrate that power calculations in Haplin are applicable to genetic association studies using either log-linear or multinomial modeling approaches. CONCLUSIONS:Haplin provides a robust and reliable framework for power calculations in genetic association analyses for a wide range of genetic effects and etiologic scenarios, based on genotype data from a variety of child-parent configurations.