Time-dependent classification accuracy curve under marker-dependent sampling.
ABSTRACT: Evaluating the classification accuracy of a candidate biomarker signaling the onset of disease or disease status is essential for medical decision making. A good biomarker would accurately identify the patients who are likely to progress or die at a particular time in the future or who are in urgent need for active treatments. To assess the performance of a candidate biomarker, the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) are commonly used. In many cases, the standard simple random sampling (SRS) design used for biomarker validation studies is costly and inefficient. In order to improve the efficiency and reduce the cost of biomarker validation, marker-dependent sampling (MDS) may be used. In a MDS design, the selection of patients to assess true survival time is dependent on the result of a biomarker assay. In this article, we introduce a nonparametric estimator for time-dependent AUC under a MDS design. The consistency and the asymptotic normality of the proposed estimator is established. Simulation shows the unbiasedness of the proposed estimator and a significant efficiency gain of the MDS design over the SRS design.
Project description:Two-phase sampling design, where biomarkers are subsampled from a phase-one cohort sample representative of the target population, has become the gold standard in biomarker evaluation. Many two-phase case-control studies involve biased sampling of cases and/or controls in the second phase. For example, controls are often frequency-matched to cases with respect to other covariates. Ignoring biased sampling of cases and/or controls can lead to biased inference regarding biomarkers' classification accuracy. Considering the problems of estimating and comparing the area under the receiver operating characteristics curve (AUC) for a binary disease outcome, the impact of biased sampling of cases and/or controls on inference and the strategy to efficiently account for the sampling scheme have not been well studied. In this project, we investigate the inverse-probability-weighted method to adjust for biased sampling in estimating and comparing AUC. Asymptotic properties of the estimator and its inference procedure are developed for both Bernoulli sampling and finite-population stratified sampling. In simulation studies, the weighted estimators provide valid inference for estimation and hypothesis testing, while the standard empirical estimators can generate invalid inference. We demonstrate the use of the analytical variance formula for optimizing sampling schemes in biomarker study design and the application of the proposed AUC estimators to examples in HIV vaccine research and prostate cancer research.
Project description:In vaccine research, immune biomarkers that can reliably predict a vaccine's effect on the clinical endpoint (i.e., surrogate markers) are important tools for guiding vaccine development. This article addresses issues on optimizing two-phase sampling study design for evaluating surrogate markers in a principal surrogate framework, motivated by the design of a future HIV vaccine trial. To address the problem of missing potential outcomes in a standard trial design, novel trial designs have been proposed that utilize baseline predictors of the immune response biomarker(s) and/or augment the trial by vaccinating uninfected placebo recipients at the end of the trial and measuring their immune biomarkers. However, inefficient use of the augmented information can lead to counter-intuitive results on the precision of estimation. To remedy this problem, we propose a pseudo-score type estimator suitable for the augmented design and characterize its asymptotic properties. This estimator has superior performance compared with existing estimators and allows calculation of analytical variances useful for guiding study design. Based on the new estimator we investigate in detail the problem of optimizing the sampling scheme of a biomarker in a vaccine efficacy trial for efficiently estimating its surrogate effect, as characterized by the vaccine efficacy curve (a causal effect predictiveness curve) and by the predicted overall vaccine efficacy using the biomarker.
Project description:This study presents a novel methodology to investigate the nonparametric estimation of a survival probability under random censoring time using the ranked observations from a Partially Rank-Ordered Set (PROS) sampling design and employs it in a hematological disorder study. The PROS sampling design has numerous applications in medicine, social sciences and ecology where the exact measurement of the sampling units is costly; however, sampling units can be ordered by using judgment ranking or available concomitant information. The general estimation methods are not directly applicable to the case where samples are from rank-based sampling designs, because the sampling units do not meet the identically distributed assumption. We derive asymptotic distribution of a Kaplan-Meier (KM) estimator under PROS sampling design. Finally, we compare the performance of the suggested estimators via several simulation studies and apply the proposed methods to a real data set. The results show that the proposed estimator under rank-based sampling designs outperforms its counterpart in a simple random sample (SRS).
Project description:In this article we propose a separation curve method to identify the range of false positive rates for which two ROC curves differ or one ROC curve is superior to the other. Our method is based on a general multivariate ROC curve model, including interaction terms between discrete covariates and false positive rates. It is applicable with most existing ROC curve models. Furthermore, we introduce a semiparametric least squares ROC estimator and apply the estimator to the separation curve method. We derive a sandwich estimator for the covariance matrix of the semiparametric estimator. We illustrate the application of our separation curve method through two real life examples.
Project description:OBJECTIVES:There has been renewed interest in lactate as a risk biomarker in sepsis and septic shock. However, the ability of the odds ratio (OR) and change in the area under the receiver operator characteristic curve (AUC-ROC) to assess biomarker added-value has been questioned. DESIGN, SETTING AND PARTICIPANTS:A sepsis cohort was identified from the ICU database of an Australian tertiary referral hospital using APACHE III diagnostic codes. Demographic information, APACHE III scores, 24-hour post-admission patient lactate levels, and hospital mortality were accessed. MEASUREMENTS AND MAIN RESULTS:Hospital mortality was modelled using a base predictive logistic regression model and sequential addition of admission lactate, lactate clearance ([lactateadmission-lactatefinal]/lactateadmission), and area under the lactate-time curve (LTC). Added-value was assessed using lactate index OR; AUC-ROC difference (base-model versus lactate index addition); net (mortality) reclassification index (NRI; range -2 to +2); and net benefit (NB), the number of true positives per patient adjusted for the number of false positives. The data set comprised 717 patients with mean(SD) age and APACHE III score 61.1(16.5) years and 68.3(28.2) respectively; 59.2% were male. Admission lactate was 2.3(2.5) mmol/l; with lactate of ? 4 mmol/L (37% hospital mortality) in 17% and patients with lactate < 4 mmol/L having 18% hospital mortality. The admission base-model had an AUC-ROC = 0.81 with admission lactate OR = 1.127 (95%CI: 1.038, 1.224), AUC-ROC difference of 0.0032 (-0.0037, 0.01615; P = 0.61), and NRI 0.240(0.030, 0.464). The over-time model had an AUC-ROC = 0.86 with (i) clearance OR = 0.771, 95%CI: 0.578, 1.030; P = 0.08; AUC-ROC difference 0.001 (-0.003, 0.014; P = 0.78), and NRI 0.109(-0.193, 0.425) and (ii) LTC OR = 0.997, 95%CI: 0.989, 1.005, P = 0.49; AUC-ROC difference 0.004 (-0.002, 0.004; P = 0.34), and NRI 0.111(-0.222, 0.403). NB was not incremented by any lactate index. CONCLUSIONS:Lactate added-value assessment is dependent upon the performance of the underlying predictive model and should incorporate risk reclassification and net benefit measures.
Project description:This work seeks to develop exact confidence interval estimators for figures of merit that describe the performance of linear observers, and to demonstrate how these estimators can be used in the context of x-ray computed tomography (CT). The figures of merit are the receiver operating characteristic (ROC) curve and associated summary measures, such as the area under the ROC curve. Linear computerized observers are valuable for optimization of parameters associated with image reconstruction algorithms and data acquisition geometries. They provide a means to perform assessment of image quality with metrics that account not only for shift-variant resolution and nonstationary noise but that are also task-based.We suppose that a linear observer with fixed template has been defined and focus on the problem of assessing the performance of this observer for the task of deciding if an unknown lesion is present at a specific location. We introduce a point estimator for the observer signal-to-noise ratio (SNR) and identify its sampling distribution. Then, we show that exact confidence intervals can be constructed from this distribution. The sampling distribution of our SNR estimator is identified under the following hypotheses: (i) the observer ratings are normally distributed for each class of images and (ii) the variance of the observer ratings is the same for each class of images. These assumptions are, for example, appropriate in CT for ratings produced by linear observers applied to low-contrast lesion detection tasks.Unlike existing approaches to the estimation of ROC confidence intervals, the new confidence intervals presented here have exactly known coverage probabilities when our data assumptions are satisfied. Furthermore, they are applicable to the most commonly used ROC summary measures, and they may be easily computed (a computer routine is supplied along with this article on the Medical Physics Website). The utility of our exact interval estimators is demonstrated through an image quality evaluation example using real x-ray CT images. Also, strong robustness is shown to potential deviations from the assumption that the ratings for the two classes of images have equal variance. Another aspect of our interval estimators is the fact that we can calculate their mean length exactly for fixed parameter values, which enables precise investigations of sampling effects. We demonstrate this aspect by exploring the potential reduction in statistical variability that can be gained by using additional images from one class, if such images are readily available. We find that when additional images from one class are used for an ROC study, the mean AUC confidence interval length for our estimator can decrease by as much as 35%.We have shown that exact confidence intervals can be constructed for ROC curves and for ROC summary measures associated with fixed linear computerized observers applied to binary discrimination tasks at a known location. Although our intervals only apply under specific conditions, we believe that they form a valuable tool for the important problem of optimizing parameters associated with image reconstruction algorithms and data acquisition geometries, particularly in x-ray CT.
Project description:The discriminatory ability of a marker for censored survival data is routinely assessed by the time-dependent ROC curve and the c-index. The time-dependent ROC curve evaluates the ability of a biomarker to predict whether a patient lives past a particular time t. The c-index measures the global concordance of the marker and the survival time regardless of the time point. We propose a Bayesian semiparametric approach to estimate these two measures. The proposed estimators are based on the conditional distribution of the survival time given the biomarker and the empirical biomarker distribution. The conditional distribution is estimated by a linear-dependent Dirichlet process mixture model. The resulting ROC curve is smooth as it is estimated by a mixture of parametric functions. The proposed c-index estimator is shown to be more efficient than the commonly used Harrell's c-index since it uses all pairs of data rather than only informative pairs. The proposed estimators are evaluated through simulations and illustrated using a lung cancer dataset.
Project description:INTRODUCTION:Since the majority of patients are diagnosed at an advanced stage, ovarian cancer remains the most lethal gynecologic malignancy. There is no single biomarker with the sensitivity and specificity required for effective cancer screening; therefore, we investigated a panel of novel biomarkers for the early detection of high-grade serous ovarian carcinoma. METHODS:Twelve serum biomarkers with high differential gene expression and validated antibodies were selected: IL-1Ra, IL-6, Dkk-1, uPA, E-CAD, ErbB2, SLPI, HE4, CA125, LCN2, MSLN, and OPN. They were tested using Simple Plex™, a multi-analyte immunoassay platform, in samples collected from 172 patients who were either healthy, had benign gynecologic pathologies, or had high-grade serous ovarian adenocarcinomas. The receiver operating characteristic (ROC) curve, ROC area under the curve (AUC), and standard error (SE) of the AUC were obtained. Univariate ROC analyses and multivariate ROC analyses with the combination of multiple biomarkers were performed. RESULTS:The 4-marker panel consisting of CA125, HE4, E-CAD, and IL-6 had the highest ROC AUC. When evaluated for the ability to distinguish early stage ovarian cancer from a non-cancer control, not only did this 4-marker panel (AUC=0.961) performed better than CA 125 alone (AUC=0.851; P=0.0150) and HE4 alone (AUC=0.870; P=0.0220), but also performed significantly better than the 2- marker combination of CA125+HE4 (AUC=0.922; P=0.0278). The 4-marker panel had the highest average sensitivity under the region of its ROC curve corresponding to specificity ranging from 100% down to ~95%. CONCLUSION:The four-marker panel, CA125, HE4, E-CAD, and IL-6, shows potential in detecting serous ovarian cancer at earlier stages. Additional validation studies using the biomarker combination in ovarian cancer patients are warranted.
Project description:Metabolomics is increasingly being applied towards the identification of biomarkers for disease diagnosis, prognosis and risk prediction. Unfortunately among the many published metabolomic studies focusing on biomarker discovery, there is very little consistency and relatively little rigor in how researchers select, assess or report their candidate biomarkers. In particular, few studies report any measure of sensitivity, specificity, or provide receiver operator characteristic (ROC) curves with associated confidence intervals. Even fewer studies explicitly describe or release the biomarker model used to generate their ROC curves. This is surprising given that for biomarker studies in most other biomedical fields, ROC curve analysis is generally considered the standard method for performance assessment. Because the ultimate goal of biomarker discovery is the translation of those biomarkers to clinical practice, it is clear that the metabolomics community needs to start "speaking the same language" in terms of biomarker analysis and reporting-especially if it wants to see metabolite markers being routinely used in the clinic. In this tutorial, we will first introduce the concept of ROC curves and describe their use in single biomarker analysis for clinical chemistry. This includes the construction of ROC curves, understanding the meaning of area under ROC curves (AUC) and partial AUC, as well as the calculation of confidence intervals. The second part of the tutorial focuses on biomarker analyses within the context of metabolomics. This section describes different statistical and machine learning strategies that can be used to create multi-metabolite biomarker models and explains how these models can be assessed using ROC curves. In the third part of the tutorial we discuss common issues and potential pitfalls associated with different analysis methods and provide readers with a list of nine recommendations for biomarker analysis and reporting. To help readers test, visualize and explore the concepts presented in this tutorial, we also introduce a web-based tool called ROCCET (ROC Curve Explorer & Tester, http://www.roccet.ca). ROCCET was originally developed as a teaching aid but it can also serve as a training and testing resource to assist metabolomics researchers build biomarker models and conduct a range of common ROC curve analyses for biomarker studies.