LOGISTIC REGRESSION ANALYSIS WITH STANDARDIZED MARKERS.
ABSTRACT: Two different approaches to analysis of data from diagnostic biomarker studies are commonly employed. Logistic regression is used to fit models for probability of disease given marker values while ROC curves and risk distributions are used to evaluate classification performance. In this paper we present a method that simultaneously accomplishes both tasks. The key step is to standardize markers relative to the non-diseased population before including them in the logistic regression model. Among the advantages of this method are: (i) ensuring that results from regression and performance assessments are consistent with each other; (ii) allowing covariate adjustment and covariate effects on ROC curves to be handled in a familiar way, and (iii) providing a mechanism to incorporate important assumptions about structure in the ROC curve into the fitted risk model. We develop the method in detail for the problem of combining biomarker datasets derived from multiple studies, populations or biomarker measurement platforms, when ROC curves are similar across data sources. The methods are applicable to both cohort and case-control sampling designs. The dataset motivating this application concerns Prostate Cancer Antigen 3 (PCA3) for diagnosis of prostate cancer in patients with or without previous negative biopsy where the ROC curves for PCA3 are found to be the same in the two populations. Estimated constrained maximum likelihood and empirical likelihood estimators are derived. The estimators are compared in simulation studies and the methods are illustrated with the PCA3 dataset.
Project description:The receiver operating characteristic (ROC) curve is often used to evaluate the performance of a biomarker measured on continuous scale to predict the disease status or a clinical condition. Motivated by the need for novel study designs with better estimation efficiency and reduced study cost, we consider a biased sampling scheme that consists of a SRC and a supplemental TDC. Using this approach, investigators can oversample or undersample subjects falling into certain regions of the biomarker measure, yielding improved precision for the estimation of the ROC curve with a fixed sample size. Test-result-dependent sampling will introduce bias in estimating the predictive accuracy of the biomarker if standard ROC estimation methods are used. In this article, we discuss three approaches for analyzing data of a test-result-dependent structure with a special focus on the empirical likelihood method. We establish asymptotic properties of the empirical likelihood estimators for covariate-specific ROC curves and covariate-independent ROC curves and give their corresponding variance estimators. Simulation studies show that the empirical likelihood method yields good properties and is more efficient than alternative methods. Recommendations on number of regions, cutoff points, and subject allocation is made based on the simulation results. The proposed methods are illustrated with a data example based on an ongoing lung cancer clinical trial.
Project description:Receiver operating characteristic (ROC) curves are widely used to measure the discriminating power of medical tests and other classification procedures. In many practical applications, the performance of these procedures can depend on covariates such as age, naturally leading to a collection of curves associated with different covariate levels. This paper develops a Bayesian heteroscedastic semiparametric regression model and applies it to the estimation of covariate-dependent ROC curves. More specifically, our approach uses Gaussian process priors to model the conditional mean and conditional variance of the biomarker of interest for each of the populations under study. The model is illustrated through an application to the evaluation of prostate-specific antigen for the diagnosis of prostate cancer, which contrasts the performance of our model against alternative models.
Project description:Covariate-specific receiver operating characteristic (ROC) curves are often used to evaluate the classification accuracy of a medical diagnostic test or a biomarker, when the accuracy of the test is associated with certain covariates. In many large-scale screening tests, the gold standard is subject to missingness due to high cost or harmfulness to the patient. In this article, we propose a semiparametric estimation of the covariate-specific ROC curves with a partial missing gold standard. A location-scale model is constructed for the test result to model the covariates' effect, but the residual distributions are left unspecified. Thus the baseline and link functions of the ROC curve both have flexible shapes. With the gold standard missing at random (MAR) assumption, we consider weighted estimating equations for the location-scale parameters, and weighted kernel estimating equations for the residual distributions. Three ROC curve estimators are proposed and compared, namely, imputation-based, inverse probability weighted, and doubly robust estimators. We derive the asymptotic normality of the estimated ROC curve, as well as the analytical form of the standard error estimator. The proposed method is motivated and applied to the data in an Alzheimer's disease research.
Project description:In ROC analysis, covariate adjustment is advocated when the covariates impact the magnitude or accuracy of the test under study. Meanwhile, for many large scale screening tests, the true condition status may be subject to missingness because it is expensive and/or invasive to ascertain the disease status. The complete-case analysis may end up with a biased inference, also known as "verification bias." To address the issue of covariate adjustment with verification bias in ROC analysis, we propose several estimators for the area under the covariate-specific and covariate-adjusted ROC curves (AUCx and AAUC). The AUCx is directly modeled in the form of binary regression, and the estimating equations are based on the U statistics. The AAUC is estimated from the weighted average of AUCx over the covariate distribution of the diseased subjects. We employ reweighting and imputation techniques to overcome the verification bias problem. Our proposed estimators are initially derived assuming that the true disease status is missing at random (MAR), and then with some modification, the estimators can be extended to the not missing at random (NMAR) situation. The asymptotic distributions are derived for the proposed estimators. The finite sample performance is evaluated by a series of simulation studies. Our method is applied to a data set in Alzheimer's disease research.
Project description:BACKGROUND:For men on active surveillance for prostate cancer, biomarkers may improve prediction of reclassification to higher grade or volume cancer. This study examined the association of urinary PCA3 and TMPRSS2:ERG (T2:ERG) with biopsy-based reclassification. METHODS:Urine was collected at baseline, 6, 12, and 24 months in the multi-institutional Canary Prostate Active Surveillance Study (PASS), and PCA3 and T2:ERG levels were quantitated. Reclassification was an increase in Gleason score or ratio of biopsy cores with cancer to ?34%. The association of biomarker scores, adjusted for common clinical variables, with short- and long-term reclassification was evaluated. Discriminatory capacity of models with clinical variables alone or with biomarkers was assessed using receiver operating characteristic (ROC) curves and decision curve analysis (DCA). RESULTS:Seven hundred and eighty-two men contributed 2069 urine specimens. After adjusting for PSA, prostate size, and ratio of biopsy cores with cancer, PCA3 but not T2:ERG was associated with short-term reclassification at the first surveillance biopsy (OR?=?1.3; 95% CI 1.0-1.7, p?=?0.02). The addition of PCA3 to a model with clinical variables improved area under the curve from 0.743 to 0.753 and increased net benefit minimally. After adjusting for clinical variables, neither marker nor marker kinetics was associated with time to reclassification in subsequent biopsies. CONCLUSIONS:PCA3 but not T2:ERG was associated with cancer reclassification in the first surveillance biopsy but has negligible improvement over clinical variables alone in ROC or DCA analyses. Neither marker was associated with reclassification in subsequent biopsies.
Project description:Prostate cancer antigen 3 (PCA3) is a biomarker for diagnosing prostate cancer (PCa) identified in the Caucasian population. We evaluated the effectiveness of urinary PCA3 in predicting the biopsy result in 500 men undergoing initial prostate biopsy. The predictive power of the PCA3 score was evaluated by the area under receiver operating characteristic (ROC) curve (AUC) and by decision curve analysis. PCA3 score sufficed to discriminate positive from negative prostate biopsy results but was not correlated with the aggressiveness of PCa. The ROC analysis showed a higher AUC for the PCA3 score than %fPSA (0.750 vs 0.622, P = 0.046) in patients with a PSA of 4.0-10.0 ng ml-1 , but the PCA3-based model is not significantly better than the base model. Decision curve analysis indicates the PCA3-based model was superior to the base model with a higher net benefit for almost all threshold probabilities, especially the threshold probabilities of 25%-40% in patients with a PSA of 4.0-10.0 ng ml-1 . However, the AUC of the PCA3 score (0.712) is not superior to %fPSA (0.698) or PSAD (0.773) in patients with a PSA >10.0 ng ml-1 . Our results confirmed that the RT-PCR-based PCA3 test moderately improved diagnostic accuracy in Chinese patients undergoing first prostate biopsy with a PSA of 4.0-10.0 ng ml-1 .
Project description:Direct regression modeling of the subdistribution has become popular for analyzing data with multiple, competing event types. All general approaches so far are based on non-likelihood based procedures and target covariate effects on the subdistribution. We introduce a novel weighted likelihood function that allows for a direct extension of the Fine-Gray model to a broad class of semiparametric regression models. The model accommodates time-dependent covariate effects on the subdistribution hazard. To motivate the proposed likelihood method, we derive standard nonparametric estimators and discuss a new interpretation based on pseudo risk sets. We establish consistency and asymptotic normality of the estimators and propose a sandwich estimator of the variance. In comprehensive simulation studies we demonstrate the solid performance of the weighted NPMLE in the presence of independent right censoring. We provide an application to a very large bone marrow transplant dataset, thereby illustrating its practical utility.
Project description:Prostate cancer gene 3 (PCA3) is a non-coding gene specifically overexpressed in prostate cancer (PCa) that has great potential as a clinical biomarker for predicting prostate biopsy outcome. However, genetic determinants of PCA3 expression level remain unknown. To investigate the association between genetic variants and PCA3 mRNA level, a genome-wide association study was conducted in 1371 men of European descent in the REduction by DUtasteride of prostate Cancer Events trial. First-voided urine specimens containing prostate cells were obtained after digital rectal examination. The PROGENSA PCA3 assay was used to determine PCA3 score in the urinary samples. A linear regression model was used to detect the associations between (single nucleotide polymorphisms) SNPs and PCA3 score under an additive genetic model, adjusting for age and population stratification. Two SNPs, rs10993994 in ?-microseminoprotein at 10q11.23 and rs10424878 in kallikrein-related peptidase 2 at 19q13.33, were associated with PCA3 score at genome-wide significance level (P = 1.22 x 10(-9) and 1.06 x 10(-8), respectively). Men carrying the rs10993994 "T" allele or rs10424878 "A" allele had higher PCA3 score compared with men carrying rs10993994 "C" allele or rs10424878 "G" allele (? = 1.25 and 1.24, respectively). This is the first comprehensive search for genetic determinants of PCA3 score. The novel loci identified may provide insight into the molecular mechanisms of PCA3 expression as a potential marker of PCa.
Project description:Although prostate-specific antigen (PSA) serum level is currently the standard of care for prostate cancer screening in the United States, it lacks ideal specificity and additional biomarkers are needed to supplement or potentially replace serum PSA testing. Emerging evidence suggests that monitoring the noncoding RNA transcript PCA3 in urine may be useful in detecting prostate cancer in patients with elevated PSA levels. Here, we show that a multiplex panel of urine transcripts outperforms PCA3 transcript alone for the detection of prostate cancer. We measured the expression of seven putative prostate cancer biomarkers, including PCA3, in sedimented urine using quantitative PCR on a cohort of 234 patients presenting for biopsy or radical prostatectomy. By univariate analysis, we found that increased GOLPH2, SPINK1, and PCA3 transcript expression and TMPRSS2:ERG fusion status were significant predictors of prostate cancer. Multivariate regression analysis showed that a multiplexed model, including these biomarkers, outperformed serum PSA or PCA3 alone in detecting prostate cancer. The area under the receiver-operating characteristic curve was 0.758 for the multiplexed model versus 0.662 for PCA3 alone (P = 0.003). The sensitivity and specificity for the multiplexed model were 65.9% and 76.0%, respectively, and the positive and negative predictive values were 79.8% and 60.8%, respectively. Taken together, these results provide the framework for the development of highly optimized, multiplex urine biomarker tests for more accurate detection of prostate cancer.