Improving efficiency in clinical trials using auxiliary information: Application of a multi-state cure model.
ABSTRACT: In clinical trials, an intermediate marker measured after randomization can often provide early information about the treatment effect on the final outcome of interest. We explore the use of recurrence time as an auxiliary variable for estimating the treatment effect on overall survival in phase three randomized trials of colon cancer. A multi-state model with an incorporated cured fraction for recurrence is used to jointly model time to recurrence and time to death. We explore different ways in which the information about recurrence time and the assumptions in the model can lead to improved efficiency. Estimates of overall survival and disease-free survival can be derived directly from the model with efficiency gains obtained as compared to Kaplan-Meier estimates. Alternatively, efficiency gains can be achieved by using the model in a weaker way in a multiple imputation procedure, which imputes death times for censored subjects. By using the joint model, recurrence is used as an auxiliary variable in predicting survival times. We demonstrate the potential use of the proposed methods in shortening the length of a trial and reducing sample sizes.
Project description:Two-stage design has long been recognized to be a cost-effective way for conducting biomedical studies. In many trials, auxiliary covariate information may also be available, and it is of interest to exploit these auxiliary data to improve the efficiency of inferences. In this paper, we propose a 2-stage design with continuous outcome where the second-stage data is sampled with an "outcome-auxiliary-dependent sampling" (OADS) scheme. We propose an estimator which is the maximizer for an estimated likelihood function. We show that the proposed estimator is consistent and asymptotically normally distributed. The simulation study indicates that greater study efficiency gains can be achieved under the proposed 2-stage OADS design by utilizing the auxiliary covariate information when compared with other alternative sampling schemes. We illustrate the proposed method by analyzing a data set from an environmental epidemiologic study.
Project description:Clinical studies aimed at identifying effective treatments to reduce the risk of disease or death often require long term follow-up of participants in order to observe a sufficient number of events to precisely estimate the treatment effect. In such studies, observing the outcome of interest during follow-up may be difficult and high rates of censoring may be observed which often leads to reduced power when applying straightforward statistical methods developed for time-to-event data. Alternative methods have been proposed to take advantage of auxiliary information that may potentially improve efficiency when estimating marginal survival and improve power when testing for a treatment effect. Recently, Parast et al. (J Am Stat Assoc 109(505):384-394, 2014) proposed a landmark estimation procedure for the estimation of survival and treatment effects in a randomized clinical trial setting and demonstrated that significant gains in efficiency and power could be obtained by incorporating intermediate event information as well as baseline covariates. However, the procedure requires the assumption that the potential outcomes for each individual under treatment and control are independent of treatment group assignment which is unlikely to hold in an observational study setting. In this paper we develop the landmark estimation procedure for use in an observational setting. In particular, we incorporate inverse probability of treatment weights (IPTW) in the landmark estimation procedure to account for selection bias on observed baseline (pretreatment) covariates. We demonstrate that consistent estimates of survival and treatment effects can be obtained by using IPTW and that there is improved efficiency by using auxiliary intermediate event and baseline information. We compare our proposed estimates to those obtained using the Kaplan-Meier estimator, the original landmark estimation procedure, and the IPTW Kaplan-Meier estimator. We illustrate our resulting reduction in bias and gains in efficiency through a simulation study and apply our procedure to an AIDS dataset to examine the effect of previous antiretroviral therapy on survival.
Project description:Forest inventories require estimates and measures of uncertainty for subpopulations such as management units. These units often times hold a small sample size, so they should be regarded as small areas. When auxiliary information is available, different small area estimation methods have been proposed to obtain reliable estimates for small areas. Unit level empirical best linear unbiased predictors (EBLUP) based on plot or grid unit level models have been studied more thoroughly than area level EBLUPs, where the modelling occurs at the management unit scale. Area level EBLUPs do not require a precise plot positioning and allow the use of variable radius plots, thus reducing fieldwork costs. However, their performance has not been examined thoroughly. We compared unit level and area level EBLUPs, using LiDAR auxiliary information collected for inventorying 98,104 ha coastal coniferous forest. Unit level models were consistently more accurate than area level EBLUPs, and area level EBLUPs were consistently more accurate than field estimates except for large management units that held a large sample. For stand density, volume, basal area, quadratic mean diameter, mean height and Lorey's height, root mean squared errors (rmses) of estimates obtained using area level EBLUPs were, on average, 1.43, 2.83, 2.09, 1.40, 1.32 and 1.64 times larger than those based on unit level estimates, respectively. Similarly, direct field estimates had rmses that were, on average, 1.37, 1.45, 1.17, 1.17, 1.26, and 1.38 times larger than rmses of area level EBLUPs. Therefore, area level models can lead to substantial gains in accuracy compared to direct estimates, and unit level models lead to very important gains in accuracy compared to area level models, potentially justifying the additional costs of obtaining accurate field plot coordinates.
Project description:With the rapidly increasing availability of data in the public domain, combining information from different sources to infer about associations or differences of interest has become an emerging challenge to researchers. This paper presents a novel approach to improve efficiency in estimating the survival time distribution by synthesizing information from the individual-level data with t-year survival probabilities from external sources such as disease registries. While disease registries provide accurate and reliable overall survival statistics for the disease population, critical pieces of information that influence both choice of treatment and clinical outcomes usually are not available in the registry database. To combine with the published information, we propose to summarize the external survival information via a system of nonlinear population moments and estimate the survival time model using empirical likelihood methods. The proposed approach is more flexible than the conventional meta-analysis in the sense that it can automatically combine survival information for different subgroups and the information may be derived from different studies. Moreover, an extended estimator that allows for a different baseline risk in the aggregate data is also studied. Empirical likelihood ratio tests are proposed to examine whether the auxiliary survival information is consistent with the individual-level data. Simulation studies show that the proposed estimators yield a substantial gain in efficiency over the conventional partial likelihood approach. Two sets of data analysis are conducted to illustrate the methods and theory.
Project description:Two-stage design is a well-known cost-effective way for conducting biomedical studies when the exposure variable is expensive or difficult to measure. Recent research development further allowed one or both stages of the two-stage design to be outcome dependent on a continuous outcome variable. This outcome-dependent sampling feature enables further efficiency gain in parameter estimation and overall cost reduction of the study (e.g. Wang, X. and Zhou, H., 2010. Design and inference for cancer biomarker study with an outcome and auxiliary-dependent subsampling. Biometrics 66, 502-511; Zhou, H., Song, R., Wu, Y. and Qin, J., 2011. Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics 67, 194-202). In this paper, we develop a semiparametric mixed effect regression model for data from a two-stage design where the second-stage data are sampled with an outcome-auxiliary-dependent sample (OADS) scheme. Our method allows the cluster- or center-effects of the study subjects to be accounted for. We propose an estimated likelihood function to estimate the regression parameters. Simulation study indicates that greater study efficiency gains can be achieved under the proposed two-stage OADS design with center-effects when compared with other alternative sampling schemes. We illustrate the proposed method by analyzing a dataset from the Collaborative Perinatal Project.
Project description:As biological studies become more expensive to conduct, statistical methods that take advantage of existing auxiliary information about an expensive exposure variable are desirable in practice. Such methods should improve the study efficiency and increase the statistical power for a given number of assays. In this article, we consider an inference procedure for multivariate failure time with auxiliary covariate information. We propose an estimated pseudopartial likelihood estimator under the marginal hazard model framework and develop the asymptotic properties for the proposed estimator. We conduct simulation studies to evaluate the performance of the proposed method in practical situations and demonstrate the proposed method with a data set from the studies of left ventricular dysfunction (SOLVD Investigators, 1991, New England Journal of Medicine 325, 293-302).
Project description:10 biopsies from one patient undergoing a auxiliary liver and combined kidney transplantation, where one liver lobe is replaced by an auxiliary liver lobe. Thereafter the kidney is transplanted. Keywords: Time course study 10 samples, no replicates.
Project description:To address the objective in a clinical trial to estimate the mean or mean difference of an expensive endpoint Y, one approach employs a two-phase sampling design, wherein inexpensive auxiliary variables W predictive of Y are measured in everyone, Y is measured in a random sample, and the semiparametric efficient estimator is applied. This approach is made efficient by specifying the phase two selection probabilities as optimal functions of the auxiliary variables and measurement costs. While this approach is familiar to survey samplers, it apparently has seldom been used in clinical trials, and several novel results practicable for clinical trials are developed. We perform simulations to identify settings where the optimal approach significantly improves efficiency compared to approaches in current practice. We provide proofs and R code. The optimality results are developed to design an HIV vaccine trial, with objective to compare the mean 'importance-weighted' breadth (Y) of the T-cell response between randomized vaccine groups. The trial collects an auxiliary response (W) highly predictive of Y and measures Y in the optimal subset. We show that the optimal design-estimation approach can confer anywhere between absent and large efficiency gain (up to 24 % in the examples) compared to the approach with the same efficient estimator but simple random sampling, where greater variability in the cost-standardized conditional variance of Y given W yields greater efficiency gains. Accurate estimation of E[Y?|?W] is important for realizing the efficiency gain, which is aided by an ample phase two sample and by using a robust fitting method.
Project description:In this paper we use Cox's regression model to fit failure time data with continuous informative auxiliary variables in the presence of a validation subsample. We first estimate the induced relative risk function by kernel smoothing based on the validation subsample, and then improve the estimation by utilizing the information on the incomplete observations from non-validation subsample and the auxiliary observations from the primary sample. Asymptotic normality of the proposed estimator is derived. The proposed method allows one to robustly model the failure time data with an informative multivariate auxiliary covariate. Comparison of the proposed approach with several existing methods is made via simulations. Two real datasets are analyzed to illustrate the proposed method.
Project description:In many studies with a survival outcome, it is often not feasible to fully observe the primary event of interest. This often leads to heavy censoring and thus, difficulty in efficiently estimating survival or comparing survival rates between two groups. In certain diseases, baseline covariates and the event time of non-fatal intermediate events may be associated with overall survival. In these settings, incorporating such additional information may lead to gains in efficiency in estimation of survival and testing for a difference in survival between two treatment groups. If gains in efficiency can be achieved, it may then be possible to decrease the sample size of patients required for a study to achieve a particular power level or decrease the duration of the study. Most existing methods for incorporating intermediate events and covariates to predict survival focus on estimation of relative risk parameters and/or the joint distribution of events under semiparametric models. However, in practice, these model assumptions may not hold and hence may lead to biased estimates of the marginal survival. In this paper, we propose a semi-nonparametric two-stage procedure to estimate and compare t-year survival rates by incorporating intermediate event information observed before some landmark time, which serves as a useful approach to overcome semi-competing risks issues. In a randomized clinical trial setting, we further improve efficiency through an additional calibration step. Simulation studies demonstrate substantial potential gains in efficiency in terms of estimation and power. We illustrate our proposed procedures using an AIDS Clinical Trial Protocol 175 dataset by estimating survival and examining the difference in survival between two treatment groups: zidovudine and zidovudine plus zalcitabine.