Semiparametric Regression Estimation for Recurrent Event Data with Errors in Covariates under Informative Censoring.
ABSTRACT: Recurrent event data arise frequently in many longitudinal follow-up studies. Hence, evaluating covariate effects on the rates of occurrence of such events is commonly of interest. Examples include repeated hospitalizations, recurrent infections of HIV, and tumor recurrences. In this article, we consider semiparametric regression methods for the occurrence rate function of recurrent events when the covariates may be measured with errors. In contrast to the existing works, in our case the conventional assumption of independent censoring is violated since the recurrent event process is interrupted by some correlated events, which is called informative drop-out. Further, some covariates may be measured with errors. To accommodate for both informative censoring and measurement error, the occurrence of recurrent events is modelled through an unspecified frailty distribution and accompanied with a classical measurement error model. We propose two corrected approaches based on different ideas, and we show that they are numerically identical when estimating the regression parameters. The asymptotic properties of the proposed estimators are established, and the finite sample performance is examined via simulations. The proposed methods are applied to the Nutritional Prevention of Cancer trial for assessing the effect of the plasma selenium treatment on the recurrence of squamous cell carcinoma.
Project description:In multivariate recurrent event data regression, observation of recurrent events is usually terminated by other events that are associated with the recurrent event processes, resulting in informative censoring. Additionally, some covariates could be measured with errors. In some applications, an instrumental variable is observed in a subsample, namely a calibration sample, which can be applied for bias correction. In this article, we develop two non-parametric correction approaches to simultaneously correct for the informative censoring and measurement errors in the analysis of multivariate recurrent event data. A shared frailty model is adopted to characterize the informative censoring and dependence among different types of recurrent events. To adjust for measurement errors, a non-parametric correction method using the calibration sample only is proposed. In the second approach, the information from the whole cohort is incorporated by the generalized method of moments. The proposed methods do not require the Poisson-type assumption for the multivariate recurrent event process and the distributional assumption for the frailty. Moreover, we do not need to impose any distributional assumption on the underlying covariates and measurement error. Both methods perform well, but the second approach improves efficiency. The proposed methods are applied to the Nutritional Prevention of Cancer trial to assess the effect of selenium treatment on the recurrences of basal cell carcinoma and squamous cell carcinoma.
Project description:In this manuscript we propose a novel approach for the analysis of longitudinal data that have informative dropout. We jointly model the slopes of covariates of interest and the censoring process for which we assume a survival model with logistic non-constant dropout hazard in a likelihood function that is integrated over the random effects. Maximization of the marginal likelihood function results in acquiring maximum likelihood estimates for the population slopes and empirical Bayes estimates for the individual slopes that are predicted using Gaussian quadrature. Our simulation study results indicated that the performance of this model is superior in terms of accuracy and validity of the estimates compared to other models such as logistic non-constant hazard censoring model that does not include covariates, logistic constant censoring model with covariates, bootstrapping approach as well as mixed models. Sensitivity analyses for the dropout hazard and non-Gaussian errors were also undertaken to assess robustness of the proposed approach to such violations. Our model was illustrated using a cohort of renal transplant patients with estimated glomerular filtration rate as the outcome of interest.
Project description:We propose a general novel class of joint models to analyze recurrent events that has a wide variety of applications. The focus in this article is to model the bleeding and transfusion events in myelodysplastic syndrome (MDS) studies, where patients may die or withdraw from the study early due to adverse events or other reasons, such as consent withdrawal or required alternative therapy during the study. The proposed model accommodates multiple recurrent events and multivariate informative censoring through a shared random-effects model. The random-effects model captures both within-subject and within-event dependence simultaneously. We construct the likelihood function for the semiparametric joint model and develop an expectation-maximization (EM) algorithm for inference. The computational burden does not increase with the number of types of recurrent events. We utilize the MDS clinical trial data to illustrate our proposed methodology. We also conduct a number of simulations to examine the performance of the proposed model.
Project description:In colorectal polyp prevention trials, estimation of the rate of recurrence of adenomas at the end of the trial may be complicated by dependent censoring, that is, time to follow-up colonoscopy and dropout may be dependent on time to recurrence. Assuming that the auxiliary variables capture the dependence between recurrence and censoring times, we propose to fit two working models with the auxiliary variables as covariates to define risk groups and then extend an existing weighted logistic regression method for independent censoring to each risk group to accommodate potential dependent censoring. In a simulation study, we show that the proposed method results in both a gain in efficiency and reduction in bias for estimating the recurrence rate. We illustrate the methodology by analyzing a recurrent adenoma dataset from a colorectal polyp prevention trial.
Project description:Inverse probability-weighted estimators are widely used in applications where data are missing due to nonresponse or censoring and in the estimation of causal effects from observational studies. Current estimators rely on ignorability assumptions for response indicators or treatment assignment and outcomes being conditional on observed covariates which are assumed to be measured without error. However, measurement error is common for the variables collected in many applications. For example, in studies of educational interventions, student achievement as measured by standardized tests is almost always used as the key covariate for removing hidden biases, but standardized test scores may have substantial measurement errors. We provide several expressions for a weighting function that can yield a consistent estimator for population means using incomplete data and covariates measured with error. We propose a method to estimate the weighting function from data. The results of a simulation study show that the estimator is consistent and has no bias and small variance.
Project description:Propensity score methods are an important tool to help reduce confounding in non-experimental studies and produce more accurate causal effect estimates. Most propensity score methods assume that covariates are measured without error. However, covariates are often measured with error. Recent work has shown that ignoring such error could lead to bias in treatment effect estimates. In this paper, we consider an additional complication: that of differential measurement error across treatment groups, such as can occur if a covariate is measured differently in the treatment and control groups. We propose two flexible Bayesian approaches for handling differential measurement error when estimating average causal effects using propensity score methods. We consider three scenarios: systematic (i.e., a location shift), heteroscedastic (i.e., different variances), and mixed (both systematic and heteroscedastic) measurement errors. We also explore various prior choices (i.e., weakly informative or point mass) on the sensitivity parameters related to the differential measurement error. We present results from simulation studies evaluating the performance of the proposed methods and apply these approaches to an example estimating the effect of neighborhood disadvantage on adolescent drug use disorders.
Project description:The treatment effect of a colorectal polyp prevention trial is often evaluated from the colorectal adenoma recurrence status at the end of the trial. However, early colonoscopy from some participants complicates estimation of the final study end recurrence rate. The early colonoscopy could be informative of status of recurrence and induce informative differential follow-up into the data. In this article, we use midpoint imputation to handle interval-censored observations. We then apply a weighted Kaplan-Meier method to the imputed data to adjust for potential informative differential follow-up, while estimating the recurrence rate at the end of the trial. In addition, we modify the weighted Kaplan-Meier method to handle a situation with multiple prognostic covariates by deriving a risk score of recurrence from a working logistic regression model and then use the risk score to define risk groups to perform weighted Kaplan-Meier estimation. We argue that midpoint imputation will produce an unbiased estimate of recurrence rate at the end of the trial under an assumption that censoring only depends on the status of early colonoscopy. The method described here is illustrated with an example from a colon polyp prevention study.
Project description:Recurrent events data are frequently encountered in clinical trials. This article develops robust covariate-adjusted log-rank statistics applied to recurrent events data with arbitrary numbers of events under independent censoring and the corresponding sample size formula. The proposed log-rank tests are robust with respect to different data-generating processes and are adjusted for predictive covariates. It reduces to the Kong and Slud (1997, Biometrika 84, 847-862) setting in the case of a single event. The sample size formula is derived based on the asymptotic normality of the covariate-adjusted log-rank statistics under certain local alternatives and a working model for baseline covariates in the recurrent event data context. When the effect size is small and the baseline covariates do not contain significant information about event times, it reduces to the same form as that of Schoenfeld (1983, Biometrics 39, 499-503) for cases of a single event or independent event times within a subject. We carry out simulations to study the control of type I error and the comparison of powers between several methods in finite samples. The proposed sample size formula is illustrated using data from an rhDNase study.
Project description:BACKGROUND:Tacrolimus (TAC) is an immunosuppressant drug given to kidney transplant recipients post-transplant to prevent antibody formation and kidney rejection. The optimal therapeutic dose for TAC is poorly defined and therapy requires frequent monitoring of drug trough levels. Analyzing the association between TAC levels over time and the development of potentially harmful de novo donor specific antibodies (dnDSA) is complex because TAC levels are subject to measurement error and dnDSA is assessed at discrete times, so it is an interval censored time-to-event outcome. METHODS:Using data from the University of Colorado Transplant Center, we investigated the association between TAC and dnDSA using a shared random effects (intercept and slope) model with longitudinal and interval censored survival sub-models (JM) and compared it with the more traditional interval censored survival model with a time-varying covariate (TVC). We carried out simulations to compare bias, level and power for the association parameter in the TVC and JM under varying conditions of measurement error and interval censoring. In addition, using Markov Chain Monte Carlo (MCMC) methods allowed us to calculate clinically relevant quantities along with credible intervals (CrI). RESULTS:The shared random effects model was a better fit and showed both the average TAC and the slope of TAC were associated with risk of dnDSA. The simulation studies demonstrated that, in the presence of heavy interval censoring and high measurement error, the TVC survival model underestimates the association between the survival and longitudinal measurement and has inflated type I error and considerably less power to detect associations. CONCLUSIONS:To avoid underestimating associations, shared random effects models should be used in analyses of data with interval censoring and measurement error.
Project description:This paper proposes a novel paradigm for building regression trees and ensemble learning in survival analysis. Generalizations of the CART and Random Forests algorithms for general loss functions, and in the latter case more general bootstrap procedures, are both introduced. These results, in combination with an extension of the theory of censoring unbiased transformations applicable to loss functions, underpin the development of two new classes of algorithms for constructing survival trees and survival forests: Censoring Unbiased Regression Trees and Censoring Unbiased Regression Ensembles. For a certain "doubly robust" censoring unbiased transformation of squared error loss, we further show how these new algorithms can be implemented using existing software (e.g., CART, random forests). Comparisons of these methods to existing ensemble procedures for predicting survival probabilities are provided in both simulated settings and through applications to four datasets. It is shown that these new methods either improve upon, or remain competitive with, existing implementations of random survival forests, conditional inference forests, and recursively imputed survival trees.