The impact of covariate measurement error on risk prediction.
ABSTRACT: In the development of risk prediction models, predictors are often measured with error. In this paper, we investigate the impact of covariate measurement error on risk prediction. We compare the prediction performance using a costly variable measured without error, along with error-free covariates, to that of a model based on an inexpensive surrogate along with the error-free covariates. We consider continuous error-prone covariates with homoscedastic and heteroscedastic errors, and also a discrete misclassified covariate. Prediction performance is evaluated by the area under the receiver operating characteristic curve (AUC), the Brier score (BS), and the ratio of the observed to the expected number of events (calibration). In an extensive numerical study, we show that (i) the prediction model with the error-prone covariate is very well calibrated, even when it is mis-specified; (ii) using the error-prone covariate instead of the true covariate can reduce the AUC and increase the BS dramatically; (iii) adding an auxiliary variable, which is correlated with the error-prone covariate but conditionally independent of the outcome given all covariates in the true model, can improve the AUC and BS substantially. We conclude that reducing measurement error in covariates will improve the ensuing risk prediction, unless the association between the error-free and error-prone covariates is very high. Finally, we demonstrate how a validation study can be used to assess the effect of mismeasured covariates on risk prediction. These concepts are illustrated in a breast cancer risk prediction model developed in the Nurses' Health Study.
Project description:Covariate measurement error and missing responses are typical features in longitudinal data analysis. There has been extensive research on either covariate measurement error or missing responses, but relatively little work has been done to address both simultaneously. In this paper, we propose a simple method for the marginal analysis of longitudinal data with time-varying covariates, some of which are measured with error, while the response is subject to missingness. Our method has a number of appealing properties: assumptions on the model are minimal, with none needed about the distribution of the mismeasured covariate; implementation is straightforward and its applicability is broad. We provide both theoretical justification and numerical results.
Project description:In clinical studies, covariates are often measured with error due to biological fluctuations, device error and other sources. Summary statistics and regression models that are based on mismeasured data will differ from the corresponding analysis based on the "true" covariate. Statistical analysis can be adjusted for measurement error, however various methods exhibit a tradeo between convenience and performance. Moment Adjusted Imputation (MAI) is method for measurement error in a scalar latent variable that is easy to implement and performs well in a variety of settings. In practice, multiple covariates may be similarly influenced by biological fluctuastions, inducing correlated multivariate measurement error. The extension of MAI to the setting of multivariate latent variables involves unique challenges. Alternative strategies are described, including a computationally feasible option that is shown to perform well.
Project description:Mismeasured time to event data used as a predictor in risk prediction models will lead to inaccurate predictions. This arises in the context of self-reported family history, a time to event predictor often measured with error, used in Mendelian risk prediction models. Using validation data, we propose a method to adjust for this type of error. We estimate the measurement error process using a nonparametric smoothed Kaplan-Meier estimator, and use Monte Carlo integration to implement the adjustment. We apply our method to simulated data in the context of both Mendelian and multivariate survival prediction models. Simulations are evaluated using measures of mean squared error of prediction (MSEP), area under the response operating characteristics curve (ROC-AUC), and the ratio of observed to expected number of events. These results show that our method mitigates the effects of measurement error mainly by improving calibration and total accuracy. We illustrate our method in the context of Mendelian risk prediction models focusing on misreporting of breast cancer, fitting the measurement error model on data from the University of California at Irvine, and applying our method to counselees from the Cancer Genetics Network. We show that our method improves overall calibration, especially in low risk deciles.
Project description:Studies of clinical characteristics frequently measure covariates with a single observation. This may be a mismeasured version of the "true" phenomenon due to sources of variability like biological fluctuations and device error. Descriptive analyses and outcome models that are based on mismeasured data generally will not reflect the corresponding analyses based on the "true" covariate. Many statistical methods are available to adjust for measurement error. Imputation methods like regression calibration and moment reconstruction are easily implemented but are not always adequate. Sophisticated methods have been proposed for specific applications like density estimation, logistic regression, and survival analysis. However, it is frequently infeasible for an analyst to adjust each analysis separately, especially in preliminary studies where resources are limited. We propose an imputation approach called moment-adjusted imputation that is flexible and relatively automatic. Like other imputation methods, it can be used to adjust a variety of analyses quickly, and it performs well under a broad range of circumstances. We illustrate the method via simulation and apply it to a study of systolic blood pressure and health outcomes in patients hospitalized with acute heart failure.
Project description:An important use of measurement error models is to correct regression models for bias due to covariate measurement error. Most measurement error models assume that the observed error-prone covariate (WW ) is a linear function of the unobserved true covariate (X) plus other covariates (Z) in the regression model. In this paper, we consider models for W that include interactions between X and Z. We derive the conditional distribution of X given W and Z and use it to extend the method of regression calibration to this class of measurement error models. We apply the model to dietary data and test whether self-reported dietary intake includes an interaction between true intake and body mass index. We also perform simulations to compare the model to simpler approximate calibration models.
Project description:One of the main limitations of causal inference methods is that they rely on the assumption that all variables are measured without error. A popular approach for handling measurement error is simulation-extrapolation (SIMEX). However, its use for estimating causal effects have been examined only in the context of an additive, non-differential, and homoscedastic classical measurement error structure. In this article we extend the SIMEX methodology, in the context of a mean reverting measurement error structure, to a doubly robust estimator of the average treatment effect when a single covariate is measured with error but the outcome and treatment and treatment indicator are not. Throughout this article we assume that an independent validation sample is available. Simulation studies suggest that our method performs better than a naive approach that simply uses the covariate measured with error.
Project description:This paper considers identification and estimation of a general nonlinear Errors-in-Variables (EIV) model using two samples. Both samples consist of a dependent variable, some error-free covariates, and an error-prone covariate, for which the measurement error has unknown distribution and could be arbitrarily correlated with the latent true values; and neither sample contains an accurate measurement of the corresponding true variable. We assume that the regression model of interest - the conditional distribution of the dependent variable given the latent true covariate and the error-free covariates - is the same in both samples, but the distributions of the latent true covariates vary with observed error-free discrete covariates. We first show that the general latent nonlinear model is nonparametrically identified using the two samples when both could have nonclassical errors, without either instrumental variables or independence between the two samples. When the two samples are independent and the nonlinear regression model is parameterized, we propose sieve Quasi Maximum Likelihood Estimation (Q-MLE) for the parameter of interest, and establish its root-n consistency and asymptotic normality under possible misspecification, and its semiparametric efficiency under correct specification, with easily estimated standard errors. A Monte Carlo simulation and a data application are presented to show the power of the approach.
Project description:Sensitivity analysis results are given for differential measurement error of either the exposure or outcome. In the case of differential measurement error of the outcome, it is shown that the true effect of the exposure on the outcome on the risk ratio scale must be at least as large as the observed association between the exposure and the mismeasured outcome divided by the maximum strength of differential measurement error. This maximum strength of differential measurement error is itself assessed as the risk ratio of the controlled direct effect of the exposure on the mismeasured outcome not through the true outcome. In the case of differential measurement error of the exposure, under certain assumptions concerning classification probabilities, the true effect on the odds ratio scale of the exposure on the outcome must be at least as large as the observed odds ratio between the mismeasured exposure and the outcome divided by the maximum odds ratio of the effect of the outcome on mismeasured exposure conditional on the true exposure. The results can be immediately used to indicate the minimum strength of differential measurement error that would be needed to explain away an observed association between an exposure measurement and an outcome measurement for this to be solely due to measurement error.
Project description:Prediction precision is arguably the most relevant criterion of a model in practice and is often a sought after property. A common difficulty with covariates measured with errors is the impossibility of performing prediction evaluation on the data even if a model is completely given without any unknown parameters. We bypass this inherent difficulty by using special properties on moment relations in linear regression models with measurement errors. The end product is a model selection procedure that achieves the same optimality properties that are achieved in classical linear regression models without covariate measurement error. Asymptotically, the procedure selects the model with the minimum prediction error in general, and selects the smallest correct model if the regression relation is indeed linear. Our model selection procedure is useful in prediction when future covariates without measurement error become available, e.g., due to improved technology or better management and design of data collection procedures.
Project description:To identify sources of exposure variability for the tumor growth inhibitor 17-dimethylaminoethylamino-17-demethoxygeldanamycin (17-DMAG) using a population pharmacokinetic analysis.A total 67 solid tumor patients at 2 centers were given 1 h infusions of 17-DMAG either as a single dose, daily for 3 days, or daily for 5 days. Blood samples were extensively collected and 17-DMAG plasma concentrations were measured by liquid chromatography/mass spectrometry. Population pharmacokinetic analysis of the 17-DMAG plasma concentration with time was performed using nonlinear mixed effect modeling to evaluate the effects of covariates, inter-individual variability, and between-occasion variability on model parameters using a stepwise forward addition then backward elimination modeling approach. The inter-individual exposure variability and the effects of between-occasion variability on exposure were assessed by simulating the 95 % prediction interval of the AUC per dose, AUC(0-24 h), using the final model and a model with no between-occasion variability, respectively, subject to the five day 17-DMAG infusion protocol with administrations of the median observed dose.A 3-compartment model with first order elimination (ADVAN11, TRANS4) and a proportional residual error, exponentiated inter-individual variability and between occasion variability on Q2 and V1 best described the 17-DMAG concentration data. No covariates were statistically significant. The simulated 95% prediction interval of the AUC(0-24 h) for the median dose of 36 mg/m(2) was 1,059-9,007 mg/L h and the simulated 95 % prediction interval of the AUC(0-24 h) considering the impact of between-occasion variability alone was 2,910-4,077 mg/L h.Population pharmacokinetic analysis of 17-DMAG found no significant covariate effects and considerable inter-individual variability; this implies a wide range of exposures in the population and which may affect treatment outcome. Patients treated with 17-DMAG may require therapeutic drug monitoring which could help achieve more uniform exposure leading to safer and more effective therapy.