Variable selection for covariate-adjusted semiparametric inference in randomized clinical trials.
ABSTRACT: Extensive baseline covariate information is routinely collected on participants in randomized clinical trials, and it is well recognized that a proper covariate-adjusted analysis can improve the efficiency of inference on the treatment effect. However, such covariate adjustment has engendered considerable controversy, as post hoc selection of covariates may involve subjectivity and may lead to biased inference, whereas prior specification of the adjustment may exclude important variables from consideration. Accordingly, how to select covariates objectively to gain maximal efficiency is of broad interest. We propose and study the use of modern variable selection methods for this purpose in the context of a semiparametric framework, under which variable selection in modeling the relationship between outcome and covariates is separated from estimation of the treatment effect, circumventing the potential for selection bias associated with standard analysis of covariance methods. We demonstrate that such objective variable selection techniques combined with this framework can identify key variables and lead to unbiased and efficient inference on the treatment effect. A critical issue in finite samples is validity of estimators of uncertainty, such as standard errors and confidence intervals for the treatment effect. We propose an approach to estimation of sampling variation of estimated treatment effect and show its superior performance relative to that of existing methods.
Project description:In several common study designs, regression modeling is complicated by the presence of censored covariates. Examples of such covariates include maternal age of onset of dementia that may be right censored in an Alzheimer's amyloid imaging study of healthy subjects, metabolite measurements that are subject to limit of detection censoring in a case-control study of cardiovascular disease, and progressive biomarkers whose baseline values are of interest, but are measured post-baseline in longitudinal neuropsychological studies of Alzheimer's disease. We propose threshold regression approaches for linear regression models with a covariate that is subject to random censoring. Threshold regression methods allow for immediate testing of the significance of the effect of a censored covariate. In addition, they provide for unbiased estimation of the regression coefficient of the censored covariate. We derive the asymptotic properties of the resulting estimators under mild regularity conditions. Simulations demonstrate that the proposed estimators have good finite-sample performance, and often offer improved efficiency over existing methods. We also derive a principled method for selection of the threshold. We illustrate the approach in application to an Alzheimer's disease study that investigated brain amyloid levels in older individuals, as measured through positron emission tomography scans, as a function of maternal age of dementia onset, with adjustment for other covariates. We have developed an R package, censCov, for implementation of our method, available at CRAN.
Project description:BACKGROUND:Restricted mean survival time is a measure of average survival time up to a specified time point. There has been an increased interest in using restricted mean survival time to compare treatment arms in randomized clinical trials because such comparisons do not rely on proportional hazards or other assumptions about the nature of the relationship between survival curves. METHODS:This article addresses the question of whether covariate adjustment in randomized clinical trials that compare restricted mean survival times improves precision of the estimated treatment effect (difference in restricted mean survival times between treatment arms). Although precision generally increases in linear models when prognostic covariates are added, this is not necessarily the case in non-linear models. For example, in logistic and Cox regression, the standard error of the estimated treatment effect does not decrease when prognostic covariates are added, although the situation is complicated in those settings because the estimand changes as well. Because estimation of restricted mean survival time in the manner described in this article is also based on a model that is non-linear in the covariates, we investigate whether the comparison of restricted mean survival times with adjustment for covariates leads to a reduction in the standard error of the estimated treatment effect relative to the unadjusted estimator or whether covariate adjustment provides no improvement in precision. Chen and Tsiatis suggest that precision will increase if covariates are chosen judiciously. We present results of simulation studies that compare unadjusted versus adjusted comparisons of restricted mean survival time between treatment arms in randomized clinical trials. RESULTS:We find that for comparison of restricted means in a randomized clinical trial, adjusting for covariates that are associated with survival increases precision and therefore statistical power, relative to the unadjusted estimator. Omitting important covariates results in less precision but estimates remain unbiased. CONCLUSION:When comparing restricted means in a randomized clinical trial, adjusting for prognostic covariates can improve precision and increase power.
Project description:In epidemiological and medical studies, covariate misclassification may occur when the observed categorical variables are not perfect measurements for an unobserved categorical latent predictor. It is well known that covariate measurement error in Cox regression may lead to biased estimation. Misclassification in covariates will cause bias, and adjustment for misclassification will be challenging when the gold standard variables are not available. In general, statistical modeling for misclassification is very different from that of the measurement error. In this paper, we investigate an approximate induced hazard estimator and propose an expected estimating equation estimator via an expectation-maximization algorithm to accommodate covariate misclassification when multiple surrogate variables are available. Finite sample performance is examined via simulation studies. The proposed method and other methods are applied to a human immunodeficiency virus clinical trial in which a few behavior variables from questionnaires are used as surrogates for a latent behavior variable.
Project description:The primary goal of randomized trials is to compare the effects of different interventions on some outcome of interest. In addition to the treatment assignment and outcome, data on baseline covariates, such as demographic characteristics or biomarker measurements, are typically collected. Incorporating such auxiliary co-variates in the analysis of randomized trials can increase power, but questions remain about how to preserve type I error when incorporating such covariates in a flexible way, particularly when the number of randomized units is small. Using the Young Citizens study, a cluster randomized trial of an educational intervention to promote HIV awareness, we compare several methods to evaluate intervention effects when baseline covariates are incorporated adaptively. To ascertain the validity of the methods shown in small samples, extensive simulation studies were conducted. We demonstrate that randomization inference preserves type I error under model selection while tests based on asymptotic theory may yield invalid results. We also demonstrate that covariate adjustment generally increases power, except at extremely small sample sizes using liberal selection procedures. Although shown within the context of HIV prevention research, our conclusions have important implications for maximizing efficiency and robustness in randomized trials with small samples across disciplines.
Project description:Motivated by a potential-outcomes perspective, the idea of principal stratification has been widely recognized for its relevance in settings susceptible to posttreatment selection bias such as randomized clinical trials where treatment received can differ from treatment assigned. In one such setting, we address subtleties involved in inference for causal effects when using a key covariate to predict membership in latent principal strata. We show that when treatment received can differ from treatment assigned in both study arms, incorporating a stratum-predictive covariate can make estimates of the "complier average causal effect" (CACE) derive from observations in the two treatment arms with different covariate distributions. Adopting a Bayesian perspective and using Markov chain Monte Carlo for computation, we develop posterior checks that characterize the extent to which incorporating the pretreatment covariate endangers estimation of the CACE. We apply the method to analyze a clinical trial comparing two treatments for jaw fractures in which the study protocol allowed surgeons to overrule both possible randomized treatment assignments based on their clinical judgment and the data contained a key covariate (injury severity) predictive of treatment received.
Project description:The efficiency of doubly robust estimators of the average causal effect (ACE) of a treatment can be improved by including in the treatment and outcome models only those covariates which are related to both treatment and outcome (i.e., confounders) or related only to the outcome. However, it is often challenging to identify such covariates among the large number that may be measured in a given study. In this article, we propose GLiDeR (Group Lasso and Doubly Robust Estimation), a novel variable selection technique for identifying confounders and predictors of outcome using an adaptive group lasso approach that simultaneously performs coefficient selection, regularization, and estimation across the treatment and outcome models. The selected variables and corresponding coefficient estimates are used in a standard doubly robust ACE estimator. We provide asymptotic results showing that, for a broad class of data generating mechanisms, GLiDeR yields a consistent estimator of the ACE when either the outcome or treatment model is correctly specified. A comprehensive simulation study shows that GLiDeR is more efficient than doubly robust methods using standard variable selection techniques and has substantial computational advantages over a recently proposed doubly robust Bayesian model averaging method. We illustrate our method by estimating the causal treatment effect of bilateral versus single-lung transplant on forced expiratory volume in one year after transplant using an observational registry.
Project description:This paper studies inference for the average treatment effect in randomized controlled trials with covariate-adaptive randomization. Here, by covariate-adaptive randomization, we mean randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve "balance" within each stratum. Our main requirement is that the randomization scheme assigns treatment status within each stratum so that the fraction of units being assigned to treatment within each stratum has a well behaved distribution centered around a proportion ? as the sample size tends to infinity. Such schemes include, for example, Efron's biased-coin design and stratified block randomization. When testing the null hypothesis that the average treatment effect equals a pre-specified value in such settings, we first show the usual two-sample t-test is conservative in the sense that it has limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. We show, however, that a simple adjustment to the usual standard error of the two-sample t-test leads to a test that is exact in the sense that its limiting rejection probability under the null hypothesis equals the nominal level. Next, we consider the usual t-test (on the coefficient on treatment assignment) in a linear regression of outcomes on treatment assignment and indicators for each of the strata. We show that this test is exact for the important special case of randomization schemes with ?=12 , but is otherwise conservative. We again provide a simple adjustment to the standard errors that yields an exact test more generally. Finally, we study the behavior of a modified version of a permutation test, which we refer to as the covariate-adaptive permutation test, that only permutes treatment status for units within the same stratum. When applied to the usual two-sample t-statistic, we show that this test is exact for randomization schemes with ?=12 and that additionally achieve what we refer to as "strong balance." For randomization schemes with ??12 , this test may have limiting rejection probability under the null hypothesis strictly greater than the nominal level. When applied to a suitably adjusted version of the two-sample t-statistic, however, we show that this test is exact for all randomization schemes that achieve "strong balance," including those with ??12 . A simulation study confirms the practical relevance of our theoretical results. We conclude with recommendations for empirical practice and an empirical illustration·.
Project description:In the development of risk prediction models, predictors are often measured with error. In this paper, we investigate the impact of covariate measurement error on risk prediction. We compare the prediction performance using a costly variable measured without error, along with error-free covariates, to that of a model based on an inexpensive surrogate along with the error-free covariates. We consider continuous error-prone covariates with homoscedastic and heteroscedastic errors, and also a discrete misclassified covariate. Prediction performance is evaluated by the area under the receiver operating characteristic curve (AUC), the Brier score (BS), and the ratio of the observed to the expected number of events (calibration). In an extensive numerical study, we show that (i) the prediction model with the error-prone covariate is very well calibrated, even when it is mis-specified; (ii) using the error-prone covariate instead of the true covariate can reduce the AUC and increase the BS dramatically; (iii) adding an auxiliary variable, which is correlated with the error-prone covariate but conditionally independent of the outcome given all covariates in the true model, can improve the AUC and BS substantially. We conclude that reducing measurement error in covariates will improve the ensuing risk prediction, unless the association between the error-free and error-prone covariates is very high. Finally, we demonstrate how a validation study can be used to assess the effect of mismeasured covariates on risk prediction. These concepts are illustrated in a breast cancer risk prediction model developed in the Nurses' Health Study.
Project description:BACKGROUND:It is important to estimate the treatment effect of interest accurately and precisely within the analysis of randomised controlled trials. One way to increase precision in the estimate and thus improve the power for randomised trials with continuous outcomes is through adjustment for pre-specified prognostic baseline covariates. Typically covariate adjustment is conducted using regression analysis, however recently, Inverse Probability of Treatment Weighting (IPTW) using the propensity score has been proposed as an alternative method. For a continuous outcome it has been shown that the IPTW estimator has the same large sample statistical properties as that obtained via analysis of covariance. However the performance of IPTW has not been explored for smaller population trials (<?100 participants), where precise estimation of the treatment effect has potential for greater impact than in larger samples. METHODS:In this paper we explore the performance of the baseline adjusted treatment effect estimated using IPTW in smaller population trial settings. To do so we present a simulation study including a number of different trial scenarios with sample sizes ranging from 40 to 200 and adjustment for up to 6 covariates. We also re-analyse a paediatric eczema trial that includes 60 children. RESULTS:In the simulation study the performance of the IPTW variance estimator was sub-optimal with smaller sample sizes. The coverage of 95% CI's was marginally below 95% for sample sizes <?150 and???100. For sample sizes <?100 the coverage of 95% CI's was always significantly below 95% for all covariate settings. The minimum coverage obtained with IPTW was 89% with n?=?40. In comparison, regression adjustment always resulted in 95% coverage. The analysis of the eczema trial confirmed discrepancies between the IPTW and regression estimators in a real life small population setting. CONCLUSIONS:The IPTW variance estimator does not perform so well with small samples. Thus we caution against the use of IPTW in small sample settings when the sample size is less than 150 and particularly when sample size <?100.