A model for adjusting for nonignorable verification bias in estimation of the ROC curve and its area with likelihood-based approach.
ABSTRACT: In estimation of the ROC curve, when the true disease status is subject to nonignorable missingness, the observed likelihood involves the missing mechanism given by a selection model. In this article, we proposed a likelihood-based approach to estimate the ROC curve and the area under the ROC curve when the verification bias is nonignorable. We specified a parametric disease model in order to make the nonignorable selection model identifiable. With the estimated verification and disease probabilities, we constructed four types of empirical estimates of the ROC curve and its area based on imputation and reweighting methods. In practice, a reasonably large sample size is required to estimate the nonignorable selection model in our settings. Simulation studies showed that all four estimators of ROC area performed well, and imputation estimators were generally more efficient than the other estimators proposed. We applied the proposed method to a data set from research in Alzheimer's disease.
Project description:The Area Under the Receiving Operating Characteristic Curve (AUC) is frequently used for assessing the overall accuracy of a diagnostic marker. However, estimation of AUC relies on knowledge of the true outcomes of subjects: diseased or non-diseased. Because disease verification based on a gold standard is often expensive and/or invasive, only a limited number of patients are sent to verification at doctors' discretion. Estimation of AUC is generally biased if only small verified samples are used and it is thus necessary to make corrections for such lack of information. However, correction based on the ignorable missingness assumption (or missing at random) is also biased if the missing mechanism indeed depends on the unknown disease outcome, which is called nonignorable missing. In this paper, we propose a propensity-score-adjustment method for estimating AUC based on the instrumental variable assumption when the missingness of disease status is nonignorable. The new method makes parametric assumptions on the verification probability, and the probability of being diseased for verified samples rather than for the whole sample. The proposed parametric assumption on the observed sample is easier to be verified than the parametric assumption on the full sample. We establish the asymptotic properties of the proposed estimators. A simulation study is performed to compare the proposed method with existing methods. The proposed method is also applied to an Alzheimer's disease data collected by National Alzheimer's Coordinating Center.
Project description:In ROC analysis, covariate adjustment is advocated when the covariates impact the magnitude or accuracy of the test under study. Meanwhile, for many large scale screening tests, the true condition status may be subject to missingness because it is expensive and/or invasive to ascertain the disease status. The complete-case analysis may end up with a biased inference, also known as "verification bias." To address the issue of covariate adjustment with verification bias in ROC analysis, we propose several estimators for the area under the covariate-specific and covariate-adjusted ROC curves (AUCx and AAUC). The AUCx is directly modeled in the form of binary regression, and the estimating equations are based on the U statistics. The AAUC is estimated from the weighted average of AUCx over the covariate distribution of the diseased subjects. We employ reweighting and imputation techniques to overcome the verification bias problem. Our proposed estimators are initially derived assuming that the true disease status is missing at random (MAR), and then with some modification, the estimators can be extended to the not missing at random (NMAR) situation. The asymptotic distributions are derived for the proposed estimators. The finite sample performance is evaluated by a series of simulation studies. Our method is applied to a data set in Alzheimer's disease research.
Project description:Covariate-specific receiver operating characteristic (ROC) curves are often used to evaluate the classification accuracy of a medical diagnostic test or a biomarker, when the accuracy of the test is associated with certain covariates. In many large-scale screening tests, the gold standard is subject to missingness due to high cost or harmfulness to the patient. In this article, we propose a semiparametric estimation of the covariate-specific ROC curves with a partial missing gold standard. A location-scale model is constructed for the test result to model the covariates' effect, but the residual distributions are left unspecified. Thus the baseline and link functions of the ROC curve both have flexible shapes. With the gold standard missing at random (MAR) assumption, we consider weighted estimating equations for the location-scale parameters, and weighted kernel estimating equations for the residual distributions. Three ROC curve estimators are proposed and compared, namely, imputation-based, inverse probability weighted, and doubly robust estimators. We derive the asymptotic normality of the estimated ROC curve, as well as the analytical form of the standard error estimator. The proposed method is motivated and applied to the data in an Alzheimer's disease research.
Project description:In this article, we first study parameter identifiability in randomized clinical trials with noncompliance and missing outcomes. We show that under certain conditions the parameters of interest are identifiable even under different types of completely nonignorable missing data: that is, the missing mechanism depends on the outcome. We then derive their maximum likelihood and moment estimators and evaluate their finite-sample properties in simulation studies in terms of bias, efficiency, and robustness. Our sensitivity analysis shows that the assumed nonignorable missing-data model has an important impact on the estimated complier average causal effect (CACE) parameter. Our new method provides some new and useful alternative nonignorable missing-data models over the existing latent ignorable model, which guarantees parameter identifiability, for estimating the CACE in a randomized clinical trial with noncompliance and missing data.
Project description:We explore a Bayesian approach to selection of variables that represent fixed and random effects in modeling of longitudinal binary outcomes with missing data caused by dropouts. We show via analytic results for a simple example that nonignorable missing data lead to biased parameter estimates. This bias results in selection of wrong effects asymptotically, which we can confirm via simulations for more complex settings. By jointly modeling the longitudinal binary data with the dropout process that possibly leads to nonignorable missing data, we are able to correct the bias in estimation and selection. Mixture priors with a point mass at zero are used to facilitate variable selection. We illustrate the proposed approach using a clinical trial for acute ischemic stroke.
Project description:Tang et al. (2003) considered a regression model with missing response, where the missingness mechanism depends on the value of the response variable and hence is nonignorable. They proposed three pseudolikelihood estimators, based on different treatments of the probability distribution of the completely observed covariates. The first assumes the distribution of the covariate to be known, the second estimates this distribution parametrically, and the third estimates the distribution nonparametrically. While it is not hard to show that the second estimator is more efficient than the first, Tang et al. (2003) only conjectured that the third estimator is more efficient than the first two. In this paper, we investigate the asymptotic behaviour of the third estimator by deriving a closed-form representation of its asymptotic variance. We then prove that the third estimator is more efficient than the other two. Our result can be straightforwardly applied to missingness mechanisms that are more general than that in Tang et al. (2003).
Project description:Biomarkers are playing an increasingly important role in disease screening, early detection, and risk prediction. The two-phase case-control sampling study design is widely used for the evaluation of candidate biomarkers. The sampling probabilities for cases and controls in the second phase can often depend on other covariates (sampling strata). This biased sampling can lead to invalid inference on a biomarker's classification accuracy if not properly accounted for. In this paper, we adopt the idea of inverse probability weighting and develop inverse probability weighting-based estimators for various measures of a biomarker's classification performance, including the points on the receiver operating characteristics (ROCs) curve, the area under the ROC curve (area under the curve), and the partial area under the curve. In particular, we consider classification accuracy estimators using sampling weights estimated conditionally on sampling strata and further improve their efficiency through the use of estimated weights that additionally take into account the auxiliary variables available from the phase-one cohort. We develop asymptotic properties of the proposed estimators and provide analytical variance for making inference. Extensive simulation studies demonstrate excellent performance of the proposed weighted estimators, while the traditional empirical estimator can be severely biased. We also investigate the advantages in efficiency gain for estimating various classification accuracy estimators through the use of auxiliary variables in addition to sampling strata and apply the proposed method to examples from a renal artery stenosis study and a prostate cancer study.
Project description:As an important part of modern health care, medical imaging data, which can be regarded as densely sampled functional data, have been widely used for diagnosis, screening, treatment, and prognosis, such as finding breast cancer through mammograms. The aim of this paper is to propose a functional linear regression model for using functional (or imaging) predictors to predict clinical outcomes (e.g., disease status), while addressing missing clinical outcomes. We introduce an exponential tilting semiparametric model to account for the nonignorable missing data mechanism. We develop a set of estimating equations and its associated computational methods for both parameter estimation and the selection of the tuning parameters. We also propose a bootstrap resampling procedure for carrying out statistical inference. Under some regularity conditions, we systematically establish the asymptotic properties (e.g., consistency and convergence rate) of the estimates calculated from the proposed estimating equations. Simulation studies and a real data analysis are used to illustrate the finite sample performance of the proposed methods.
Project description:For censored survival outcomes, it can be of great interest to evaluate the predictive power of individual markers or their functions. Compared with alternative evaluation approaches, approaches based on the time-dependent receiver operating characteristics (ROC) rely on much weaker assumptions, can be more robust, and hence are preferred. In this article, we examine evaluation of markers' predictive power using the time-dependent ROC curve and a concordance measure that can be viewed as a weighted area under the time-dependent area under the ROC curve profile. This study significantly advances from existing time-dependent ROC studies by developing nonparametric estimators of the summary indexes and, more importantly, rigorously establishing their asymptotic properties. It reinforces the statistical foundation of the time-dependent ROC-based evaluation approaches for censored survival outcomes. Numerical studies, including simulations and application to an HIV clinical trial, demonstrate the satisfactory finite-sample performance of the proposed approaches.