Longitudinal structural mixed models for the analysis of surgical trials with noncompliance.
ABSTRACT: Patient noncompliance complicates the analysis of many randomized trials seeking to evaluate the effect of surgical intervention as compared with a nonsurgical treatment. If selection for treatment depends on intermediate patient characteristics or outcomes, then 'as-treated' analyses may be biased for the estimation of causal effects. Therefore, the selection mechanism for treatment and/or compliance should be carefully considered when conducting analysis of surgical trials. We compare the performance of alternative methods when endogenous processes lead to patient crossover. We adopt an underlying longitudinal structural mixed model that is a natural example of a structural nested model. Likelihood-based methods are not typically used in this context; however, we show that standard linear mixed models will be valid under selection mechanisms that depend only on past covariate and outcome history. If there are underlying patient characteristics that influence selection, then likelihood methods can be extended via maximization of the joint likelihood of exposure and outcomes. Semi-parametric causal estimation methods such as marginal structural models, g-estimation, and instrumental variable approaches can also be valid, and we both review and evaluate their implementation in this setting. The assumptions required for valid estimation vary across approaches; thus, the choice of methods for analysis should be driven by which outcome and selection assumptions are plausible.
Project description:In Mendelian randomization (MR), inference about causal relationship between a phenotype of interest and a response or disease outcome can be obtained by constructing instrumental variables from genetic variants. However, MR inference requires three assumptions, one of which is that the genetic variants only influence the outcome through phenotype of interest. Pleiotropy, that is, the situation in which some genetic variants affect more than one phenotype, can invalidate these genetic variants for use as instrumental variables; thus a naive analysis will give biased estimates of the causal relation. Here, we present new methods (constrained instrumental variable [CIV] methods) to construct valid instrumental variables and perform adjusted causal effect estimation when pleiotropy exists and when the pleiotropic phenotypes are available. We demonstrate that a smoothed version of CIV performs approximate selection of genetic variants that are valid instruments, and provides unbiased estimates of the causal effects. We provide details on a number of existing methods, together with a comparison of their performance in a large series of simulations. CIV performs robustly across different pleiotropic violations of the MR assumptions. We also analyzed the data from the Alzheimer's disease (AD) neuroimaging initiative (ADNI; Mueller et al., 2005. Alzheimer's Dementia, 11(1), 55-66) to disentangle causal relationships of several biomarkers with AD progression.
Project description:Cluster randomized trials (CRTs) have been widely used in field experiments treating a cluster of individuals as the unit of randomization. This study focused particularly on situations where CRTs are accompanied by a common complication, namely, treatment noncompliance or, more generally, intervention nonadherence. In CRTs, compliance may be related not only to individual characteristics but also to the environment of clusters individuals belong to. Therefore, analyses ignoring the connection between compliance and clustering may not provide valid results. Although randomized field experiments often suffer from both noncompliance and clustering of the data, these features have been studied as separate rather than concurrent problems. On the basis of Monte Carlo simulations, this study demonstrated how clustering and noncompliance may affect statistical inferences and how these two complications can be accounted for simultaneously. In particular, the effect of the intervention on individuals who not only were assigned to active intervention but also abided by this intervention assignment (complier average causal effect) was the focus. For estimation of intervention effects considering noncompliance and data clustering, an ML-EM estimation method was employed.
Project description:OBJECTIVE:To demonstrate the application of causal inference methods to observational data in the obstetrics and gynecology field, particularly causal modeling and semi-parametric estimation. BACKGROUND:Human immunodeficiency virus (HIV)-positive women are at increased risk for cervical cancer and its treatable precursors. Determining whether potential risk factors such as hormonal contraception are true causes is critical for informing public health strategies as longevity increases among HIV-positive women in developing countries. METHODS:We developed a causal model of the factors related to combined oral contraceptive (COC) use and cervical intraepithelial neoplasia 2 or greater (CIN2+) and modified the model to fit the observed data, drawn from women in a cervical cancer screening program at HIV clinics in Kenya. Assumptions required for substantiation of a causal relationship were assessed. We estimated the population-level association using semi-parametric methods: g-computation, inverse probability of treatment weighting, and targeted maximum likelihood estimation. RESULTS:We identified 2 plausible causal paths from COC use to CIN2+: via HPV infection and via increased disease progression. Study data enabled estimation of the latter only with strong assumptions of no unmeasured confounding. Of 2,519 women under 50 screened per protocol, 219 (8.7%) were diagnosed with CIN2+. Marginal modeling suggested a 2.9% (95% confidence interval 0.1%, 6.9%) increase in prevalence of CIN2+ if all women under 50 were exposed to COC; the significance of this association was sensitive to method of estimation and exposure misclassification. CONCLUSION:Use of causal modeling enabled clear representation of the causal relationship of interest and the assumptions required to estimate that relationship from the observed data. Semi-parametric estimation methods provided flexibility and reduced reliance on correct model form. Although selected results suggest an increased prevalence of CIN2+ associated with COC, evidence is insufficient to conclude causality. Priority areas for future studies to better satisfy causal criteria are identified.
Project description:In this paper we propose methods for estimating heterogeneity in causal effects in experimental and observational studies and for conducting hypothesis tests about the magnitude of differences in treatment effects across subsets of the population. We provide a data-driven approach to partition the data into subpopulations that differ in the magnitude of their treatment effects. The approach enables the construction of valid confidence intervals for treatment effects, even with many covariates relative to the sample size, and without "sparsity" assumptions. We propose an "honest" approach to estimation, whereby one sample is used to construct the partition and another to estimate treatment effects for each subpopulation. Our approach builds on regression tree methods, modified to optimize for goodness of fit in treatment effects and to account for honest estimation. Our model selection criterion anticipates that bias will be eliminated by honest estimation and also accounts for the effect of making additional splits on the variance of treatment effect estimates within each subpopulation. We address the challenge that the "ground truth" for a causal effect is not observed for any individual unit, so that standard approaches to cross-validation must be modified. Through a simulation study, we show that for our preferred method honest estimation results in nominal coverage for 90% confidence intervals, whereas coverage ranges between 74% and 84% for nonhonest approaches. Honest estimation requires estimating the model with a smaller sample size; the cost in terms of mean squared error of treatment effects for our preferred method ranges between 7-22%.
Project description:Instrumental variables are routinely used to recover a consistent estimator of an exposure causal effect in the presence of unmeasured confounding. Instrumental variable approaches to account for nonignorable missing data also exist but are less familiar to epidemiologists. Like instrumental variables for exposure causal effects, instrumental variables for missing data rely on exclusion restriction and instrumental variable relevance assumptions. Yet these two conditions alone are insufficient for point identification. For estimation, researchers have invoked a third assumption, typically involving fairly restrictive parametric constraints. Inferences can be sensitive to these parametric assumptions, which are typically not empirically testable. The purpose of our article is to discuss another approach for leveraging a valid instrumental variable. Although the approach is insufficient for nonparametric identification, it can nonetheless provide informative inferences about the presence, direction, and magnitude of selection bias, without invoking a third untestable parametric assumption. An important contribution of this article is an Excel spreadsheet tool that can be used to obtain empirical evidence of selection bias and calculate bounds and corresponding Bayesian 95% credible intervals for a nonidentifiable population proportion. For illustrative purposes, we used the spreadsheet tool to analyze HIV prevalence data collected by the 2007 Zambia Demographic and Health Survey (DHS).
Project description:The 'Mendelian randomization' approach uses genotype as an instrumental variable to distinguish between causal and non-causal explanations of biomarker-disease associations. Classical methods for instrumental variable analysis are limited to linear or probit models without latent variables or missing data, rely on asymptotic approximations that are not valid for weak instruments and focus on estimation rather than hypothesis testing. We describe a Bayesian approach that overcomes these limitations, using the JAGS program to compute the log-likelihood ratio (lod score) between causal and non-causal explanations of a biomarker-disease association. To demonstrate the approach, we examined the relationship of plasma urate levels to metabolic syndrome in the ORCADES study of a Scottish population isolate, using genotype at six single-nucleotide polymorphisms in the urate transporter gene SLC2A9 as an instrumental variable. In models that allow for intra-individual variability in urate levels, the lod score favouring a non-causal over a causal explanation was 2.34. In models that do not allow for intra-individual variability, the weight of evidence against a causal explanation was weaker (lod score 1.38). We demonstrate the ability to test one of the key assumptions of instrumental variable analysis--that the effects of the instrument on outcome are mediated only through the intermediate variable--by constructing a test for residual effects of genotype on outcome, similar to the tests of 'overidentifying restrictions' developed for classical instrumental variable analysis. The Bayesian approach described here is flexible enough to deal with any instrumental variable problem, and does not rely on asymptotic approximations that may not be valid for weak instruments. The approach can easily be extended to combine information from different study designs. Statistical power calculations show that instrumental variable analysis with genetic instruments will typically require combining information from moderately large cohort and cross-sectional studies of biomarkers with information from very large genetic case-control studies.
Project description:Standard statistical practice used for determining the relative importance of competing causes of disease typically relies on ad hoc methods, often byproducts of machine learning procedures (stepwise regression, random forest, etc.). Causal inference framework and data-adaptive methods may help to tailor parameters to match the clinical question and free one from arbitrary modeling assumptions. Our focus is on implementations of such semiparametric methods for a variable importance measure (VIM). We propose a fully automated procedure for VIM based on collaborative targeted maximum likelihood estimation (cTMLE), a method that optimizes the estimate of an association in the presence of potentially numerous competing causes. We applied the approach to data collected from traumatic brain injury patients, specifically a prospective, observational study including three US Level-1 trauma centers. The primary outcome was a disability score (Glasgow Outcome Scale - Extended (GOSE)) collected three months post-injury. We identified clinically important predictors among a set of risk factors using a variable importance analysis based on targeted maximum likelihood estimators (TMLE) and on cTMLE. Via a parametric bootstrap, we demonstrate that the latter procedure has the potential for robust automated estimation of variable importance measures based upon machine-learning algorithms. The cTMLE estimator was associated with substantially less positivity bias as compared to TMLE and larger coverage of the 95% CI. This study confirms the power of an automated cTMLE procedure that can target model selection via machine learning to estimate VIMs in complicated, high-dimensional data.
Project description:Evidence-based personalized medicine formalizes treatment selection as an individualized treatment regime that maps up-to-date patient information into the space of possible treatments. Available patient information may include static features such race, gender, family history, genetic and genomic information, as well as longitudinal information including the emergence of comorbidities, waxing and waning of symptoms, side-effect burden, and adherence. Dynamic information measured at multiple time points before treatment assignment should be included as input to the treatment regime. However, subject longitudinal measurements are typically sparse, irregularly spaced, noisy, and vary in number across subjects. Existing estimators for treatment regimes require equal information be measured on each subject and thus standard practice is to summarize longitudinal subject information into a scalar, ad hoc summary during data pre-processing. This reduction of the longitudinal information to a scalar feature precedes estimation of a treatment regime and is therefore not informed by subject outcomes, treatments, or covariates. Furthermore, we show that this reduction requires more stringent causal assumptions for consistent estimation than are necessary. We propose a data-driven method for constructing maximally prescriptive yet interpretable features that can be used with standard methods for estimating optimal treatment regimes. In our proposed framework, we treat the subject longitudinal information as a realization of a stochastic process observed with error at discrete time points. Functionals of this latent process are then combined with outcome models to estimate an optimal treatment regime. The proposed methodology requires weaker causal assumptions than Q-learning with an ad hoc scalar summary and is consistent for the optimal treatment regime.
Project description:Estimation of causal effects from observational data is a primary goal of epidemiology. The use of multiple methods with different assumptions relating to exchangeability improves causal inference by demonstrating robustness across assumptions. We estimate the effect of antiretroviral therapy (ART) on mortality in rural KwaZulu-Natal, South Africa from 2007-2011 using two methods with substantially different assumptions: the regression discontinuity design (RDD), and inverse probability weighting of marginal structural models (IPW). The RDD analysis took advantage of a CD4 count-based threshold for ART initiation (200 cells/?l). The two methods yielded consistent but non-identical results for the effect of immediate initiation of ART (RDD intention-to-treat hazards ratio (HR) 0.66, 95% CI 0.35 to 1.26; RDD HR 0.56, 95% CI 0.41 to 0.77; IPW HR 0.49, 95% CI 0.42 to 0.58). Although RDD and IPW estimates had distinct identifying assumptions, strengths, and limitations in terms of internal and external validity, results in this application were similar. The differences in modeling approaches and external validity of each method may explain the minor differences in effect estimates, but the consistency of the results lends support for causal inference of the effect of ART and mortality from these data.
Project description:This paper investigates different approaches for causal estimation under multiple concurrent medications. Our parameter of interest is the marginal mean counterfactual outcome under different combinations of medications. We explore parametric and non-parametric methods to estimate the generalized propensity score. We then apply three causal estimation approaches (inverse probability of treatment weighting, propensity score adjustment, and targeted maximum likelihood estimation) to estimate the causal parameter of interest. Focusing on the estimation of the expected outcome under the most prevalent regimens, we compare the results obtained using these methods in a simulation study with four potentially concurrent medications. We perform a second simulation study in which some combinations of medications may occur rarely or not occur at all in the dataset. Finally, we apply the methods explored to contrast the probability of patient treatment success for the most prevalent regimens of antimicrobial agents for patients with multidrug-resistant pulmonary tuberculosis.