Using Cluster Bootstrapping to Analyze Nested Data With a Few Clusters.
ABSTRACT: Cluster randomized trials involving participants nested within intact treatment and control groups are commonly performed in various educational, psychological, and biomedical studies. However, recruiting and retaining intact groups present various practical, financial, and logistical challenges to evaluators and often, cluster randomized trials are performed with a low number of clusters (~20 groups). Although multilevel models are often used to analyze nested data, researchers may be concerned of potentially biased results due to having only a few groups under study. Cluster bootstrapping has been suggested as an alternative procedure when analyzing clustered data though it has seen very little use in educational and psychological studies. Using a Monte Carlo simulation that varied the number of clusters, average cluster size, and intraclass correlations, we compared standard errors using cluster bootstrapping with those derived using ordinary least squares regression and multilevel models. Results indicate that cluster bootstrapping, though more computationally demanding, can be used as an alternative procedure for the analysis of clustered data when treatment effects at the group level are of primary interest. Supplementary material showing how to perform cluster bootstrapped regressions using R is also provided.
Project description:The Lasso is a shrinkage regression method that is widely used for variable selection in statistical genetics. Commonly, K-fold cross-validation is used to fit a Lasso model. This is sometimes followed by using bootstrap confidence intervals to improve precision in the resulting variable selections. Nesting cross-validation within bootstrapping could provide further improvements in precision, but this has not been investigated systematically. We performed simulation studies of Lasso variable selection precision (VSP) with and without nesting cross-validation within bootstrapping. Data were simulated to represent genomic data under a polygenic model as well as under a model with effect sizes representative of typical GWAS results. We compared these approaches to each other as well as to software defaults for the Lasso. Nested cross-validation had the most precise variable selection at small effect sizes. At larger effect sizes, there was no advantage to nesting. We illustrated the nested approach with empirical data comprising SNPs and SNP-SNP interactions from the most significant SNPs in a GWAS of borderline personality symptoms. In the empirical example, we found that the default Lasso selected low-reliability SNPs and interactions which were excluded by bootstrapping.
Project description:Propensity score methods (e.g., matching, weighting, subclassification) provide a statistical approach for balancing dissimilar exposure groups on baseline covariates. These methods were developed in the context of data with no hierarchical structure or clustering. Yet in many applications the data have a clustered structure that is of substantive importance, such as when individuals are nested within healthcare providers or within schools. Recent work has extended propensity score methods to a multilevel setting, primarily focusing on binary exposures. In this paper, we focus on propensity score weighting for a continuous, rather than binary, exposure in a multilevel setting. Using simulations, we compare several specifications of the propensity score: a random effects model, a fixed effects model, and a single-level model. Additionally, our simulations compare the performance of marginal versus cluster-mean stabilized propensity score weights. In our results, regression specifications that accounted for the multilevel structure reduced bias, particularly when cluster-level confounders were omitted. Furthermore, cluster mean weights outperformed marginal weights.
Project description:We introduce a general technique for making statistical inference from clustering tools applied to gene expression microarray data. The approach utilizes an analysis of variance model to achieve normalization and estimate differential expression of genes across multiple conditions. Statistical inference is based on the application of a randomization technique, bootstrapping. Bootstrapping has previously been used to obtain confidence intervals for estimates of differential expression for individual genes. Here we apply bootstrapping to assess the stability of results from a cluster analysis. We illustrate the technique with a publicly available data set and draw conclusions about the reliability of clustering results in light of variation in the data. The bootstrapping procedure relies on experimental replication. We discuss the implications of replication and good design in microarray experiments.
Project description:BACKGROUND:Treatment non-adherence results in treatment failure, prolonged transmission of disease and emergence of drug resistance. Although the problem widely investigated, there remains an information gap on the effectiveness of different methods to improve treatment adherence and the predictors of non-adherence in resource limited countries based on theoretical models. This study aimed to evaluate the impact of psychological counseling and educational intervention on tuberculosis (TB) treatment adherence based on Health Belief Model (HBM). METHODOLOGY:A cluster randomized control trial was conducted in Addis Ababa from May to December, 2014. Patients were enrolled into study consecutively from 30 randomly selected Health Centers (HCs) (14 HCs intervention and 16 HCs control groups). A total of 698 TB patients, who were on treatment for one month to two months were enrolled. A structured questionnaire was administered to both groups of patients at baseline and endpoint of study. Control participants received routine directly-observed anti-TB therapy and the intervention group additionally received combined psychological counseling and adherence education. Treatment non-adherence level was the main outcome of the study, and multilevel logistic regression was employed to assess the impact of intervention on treatment adherence. RESULTS:At enrollment, the level of non-adherence among intervention (19.4%) and control (19.6%) groups was almost the same. However, after intervention, non-adherence level decreased among intervention group from 19.4 (at baseline) to 9.5% (at endpoint), while it increased among control group from 19.4% (baseline) to 25.4% (endpoint). Psychological counseling and educational interventions resulted in significant difference with regard to non-adherence level between intervention and control groups (Adjusted OR = 0.31, 95% Confidence Interval (CI) (0.18-0.53), p < 0.001)). CONCLUSION:Psychological counseling and educational interventions, which were guided by HBM, significantly decreased treatment non-adherence level among intervention group. Provision of psychological counseling and health education to TB patients who are on regular treatment is recommended. This could be best achieved if these interventions are guided by behavioral theories and incorporated into the routine TB treatment strategy. TRIAL REGISTRATION:Pan African Clinical Trials Registry PACTR201506001175423.
Project description:Recent advances in molecular simulations allow the evaluation of previously unattainable observables, such as rate constants for protein folding. However, these calculations are usually computationally expensive, and even significant computing resources may result in a small number of independent estimates spread over many orders of magnitude. Such small-sample, high "log-variance" data are not readily amenable to analysis using the standard uncertainty (i.e., "standard error of the mean") because unphysical negative limits of confidence intervals result. Bootstrapping, a natural alternative guaranteed to yield a confidence interval within the minimum and maximum values, also exhibits a striking systematic bias of the lower confidence limit in log space. As we show, bootstrapping artifactually assigns high probability to improbably low mean values. A second alternative, the Bayesian bootstrap strategy, does not suffer from the same deficit and is more logically consistent with the type of confidence interval desired. The Bayesian bootstrap provides uncertainty intervals that are more reliable than those from the standard bootstrap method but must be used with caution nevertheless. Neither standard nor Bayesian bootstrapping can overcome the intrinsic challenge of underestimating the mean from small-size, high log-variance samples. Our conclusions are based on extensive analysis of model distributions and reanalysis of multiple independent atomistic simulations. Although we only analyze rate constants, similar considerations will apply to related calculations, potentially including highly nonlinear averages like the Jarzynski relation.
Project description:BACKGROUND:In individually randomised trials we might expect interventions delivered in groups or by care providers to result in clustering of outcomes for participants treated in the same group or by the same care provider. In partially nested randomised controlled trials (pnRCTs) this clustering only occurs in one trial arm, commonly the intervention arm. It is important to measure and account for between-cluster variability in trial design and analysis. We compare analysis approaches for pnRCTs with continuous outcomes, investigating the impact on statistical inference of cluster sizes, coding of the non-clustered arm, intracluster correlation coefficient (ICCs), and differential variance between intervention and control arm, and provide recommendations for analysis. METHODS:We performed a simulation study assessing the performance of six analysis approaches for a two-arm pnRCT with a continuous outcome. These include: linear regression model; fully clustered mixed-effects model with singleton clusters in control arm; fully clustered mixed-effects model with one large cluster in control arm; fully clustered mixed-effects model with pseudo clusters in control arm; partially nested homoscedastic mixed effects model, and partially nested heteroscedastic mixed effects model. We varied the cluster size, number of clusters, ICC, and individual variance between the two trial arms. RESULTS:All models provided unbiased intervention effect estimates. In the partially nested mixed-effects models, methods for classifying the non-clustered control arm had negligible impact. Failure to account for even small ICCs resulted in inflated Type I error rates and over-coverage of confidence intervals. Fully clustered mixed effects models provided poor control of the Type I error rates and biased ICC estimates. The heteroscedastic partially nested mixed-effects model maintained relatively good control of Type I error rates, unbiased ICC estimation, and did not noticeably reduce power even with homoscedastic individual variances across arms. CONCLUSIONS:In general, we recommend the use of a heteroscedastic partially nested mixed-effects model, which models the clustering in only one arm, for continuous outcomes similar to those generated under the scenarios of our simulations study. However, with few clusters (3-6), small cluster sizes (5-10), and small ICC (?0.05) this model underestimates Type I error rates and there is no optimal model.
Project description:Modifiable risk factors, including life-style habits and psychological variables, have been increasingly demonstrated to have an important role in influencing morbidity and mortality in cardiovascular patients, and to account for approximately 90% of the population risk for cardiac events.Acceptance and Commitment Therapy (ACT) has shown effectiveness in promoting healthy behaviors, and improving psychological well-being in patients with chronic physical conditions. Moreover, a first application of an acceptance-based program in cardiac patients has revealed high treatment satisfaction and initial evidences of effectiveness in increasing heart-healthy behaviour. However, no clinical trial to date has evaluated the efficacy of an acceptance-based program for the modification of cardiovascular risk factors and the improvement of psychological well-being, compared to usual secondary prevention care.Approximately 168 patients will be recruited from an outpatient cardiac rehabilitation unit and randomly assigned to receive usual care or usual care?+?a brief ACT-based intervention. The ACT group will be administered five group therapy sessions integrating educational topics on heart-healthy behaviours with acceptance and mindfulness skills. Participants will be assessed at baseline, six weeks later (post treatment for the ACT condition), at six and twelve months follow-up.A partially-nested design will be used to balance effects due to clustering of participants into small therapy groups. Primary outcome measures will include biological indicators of cardiovascular risk and self-reported psychological well-being. Treatment effects will be tested via multilevel modeling after which the mediational role of psychological flexibility will be evaluated.The ACTonHEART study is the first randomized clinical trial designed to evaluate the efficacy of a brief group-administered, ACT-based program to promote health behavior change and psychological well-being among cardiac patients. Results will address the effectiveness of a brief treatment created to simultaneously impact multiple cardiovascular risk factors. Conducted in the context of clinical practice, this trial will potentially offer empirical support to alternative interventions to improve quality of life and reduce mortality and morbidity rates among cardiac patients.clinicaltrials.gov/ (NCT01909102).
Project description:Recognizing that health outcomes are influenced by and occur within multiple social and physical contexts, researchers have used multilevel modeling techniques for decades to analyze hierarchical or nested data. Cross-Classified Multilevel Models (CCMM) are a statistical technique proposed in the 1990s that extend standard multilevel modeling and enable the simultaneous analysis of non-nested multilevel data. Though use of CCMM in empirical health studies has become increasingly popular, there has not yet been a review summarizing how CCMM are used in the health literature. To address this gap, we performed a scoping review of empirical health studies using CCMM to: (a) evaluate the extent to which this statistical approach has been adopted; (b) assess the rationale and procedures for using CCMM; and (c) provide concrete recommendations for the future use of CCMM. We identified 118 CCMM papers published in English-language literature between 1994 and 2018. Our results reveal a steady growth in empirical health studies using CCMM to address a wide variety of health outcomes in clustered non-hierarchical data. Health researchers use CCMM primarily for five reasons: (1) to statistically account for non-independence in clustered data structures; out of substantive interest in the variance explained by (2) concurrent contexts, (3) contexts over time, and (4) age-period-cohort effects; and (5) to apply CCMM alongside other techniques within a joint model. We conclude by proposing a set of recommendations for use of CCMM with the aim of improved clarity and standardization of reporting in future research using this statistical approach.
Project description:Factorial experimental designs have many potential advantages for behavioral scientists. For example, such designs may be useful in building more potent interventions by helping investigators to screen several candidate intervention components simultaneously and to decide which are likely to offer greater benefit before evaluating the intervention as a whole. However, sample size and power considerations may challenge investigators attempting to apply such designs, especially when the population of interest is multilevel (e.g., when students are nested within schools, or when employees are nested within organizations). In this article, we examine the feasibility of factorial experimental designs with multiple factors in a multilevel, clustered setting (i.e., of multilevel, multifactor experiments). We conduct Monte Carlo simulations to demonstrate how design elements-such as the number of clusters, the number of lower-level units, and the intraclass correlation-affect power. Our results suggest that multilevel, multifactor experiments are feasible for factor-screening purposes because of the economical properties of complete and fractional factorial experimental designs. We also discuss resources for sample size planning and power estimation for multilevel factorial experiments. These results are discussed from a resource management perspective, in which the goal is to choose a design that maximizes the scientific benefit using the resources available for an investigation.
Project description:Resting-state functional connectivity (RSFC) records enormous functional interaction information between any pair of brain nodes, which enriches the individual-phenotypic prediction. To reduce high-dimensional features, correlation analysis is a common way for feature selection. However, resting state fMRI signal exhibits typically low signal-to-noise ratio and the correlation analysis is sensitive to outliers and data distribution, which may bring unstable features to prediction. To alleviate this problem, a bootstrapping-based feature selection framework was proposed and applied to connectome-based predictive modeling, support vector regression, least absolute shrinkage and selection operator, and Ridge regression to predict a series of cognitive traits based on Human Connectome Project data. To systematically investigate the influences of different parameter settings on the bootstrapping-based framework, 216 parameter combinations were evaluated and the best performance among them was identified as the final prediction result for each cognitive trait. By using the bootstrapping methods, the best prediction performances outperformed the baseline method in all four prediction models. Furthermore, the proposed framework could effectively reduce the feature dimension by retaining the more stable features. The results demonstrate that the proposed framework is an easy-to-use and effective method to improve RSFC prediction of cognitive traits and is highly recommended in future RSFC-prediction studies.