Confidence intervals for performance assessment of linear observers.
ABSTRACT: This work seeks to develop exact confidence interval estimators for figures of merit that describe the performance of linear observers, and to demonstrate how these estimators can be used in the context of x-ray computed tomography (CT). The figures of merit are the receiver operating characteristic (ROC) curve and associated summary measures, such as the area under the ROC curve. Linear computerized observers are valuable for optimization of parameters associated with image reconstruction algorithms and data acquisition geometries. They provide a means to perform assessment of image quality with metrics that account not only for shift-variant resolution and nonstationary noise but that are also task-based.We suppose that a linear observer with fixed template has been defined and focus on the problem of assessing the performance of this observer for the task of deciding if an unknown lesion is present at a specific location. We introduce a point estimator for the observer signal-to-noise ratio (SNR) and identify its sampling distribution. Then, we show that exact confidence intervals can be constructed from this distribution. The sampling distribution of our SNR estimator is identified under the following hypotheses: (i) the observer ratings are normally distributed for each class of images and (ii) the variance of the observer ratings is the same for each class of images. These assumptions are, for example, appropriate in CT for ratings produced by linear observers applied to low-contrast lesion detection tasks.Unlike existing approaches to the estimation of ROC confidence intervals, the new confidence intervals presented here have exactly known coverage probabilities when our data assumptions are satisfied. Furthermore, they are applicable to the most commonly used ROC summary measures, and they may be easily computed (a computer routine is supplied along with this article on the Medical Physics Website). The utility of our exact interval estimators is demonstrated through an image quality evaluation example using real x-ray CT images. Also, strong robustness is shown to potential deviations from the assumption that the ratings for the two classes of images have equal variance. Another aspect of our interval estimators is the fact that we can calculate their mean length exactly for fixed parameter values, which enables precise investigations of sampling effects. We demonstrate this aspect by exploring the potential reduction in statistical variability that can be gained by using additional images from one class, if such images are readily available. We find that when additional images from one class are used for an ROC study, the mean AUC confidence interval length for our estimator can decrease by as much as 35%.We have shown that exact confidence intervals can be constructed for ROC curves and for ROC summary measures associated with fixed linear computerized observers applied to binary discrimination tasks at a known location. Although our intervals only apply under specific conditions, we believe that they form a valuable tool for the important problem of optimizing parameters associated with image reconstruction algorithms and data acquisition geometries, particularly in x-ray CT.
Project description:The purpose of this study was to investigate the correlation between model observer and human observer performance in CT imaging for the task of lesion detection and localization when the lesion location is uncertain.Two cylindrical rods (3-mm and 5-mm diameters) were placed in a 35×26 cm torso-shaped water phantom to simulate lesions with -15 HU contrast at 120 kV. The phantom was scanned 100 times on a 128-slice CT scanner at each of four dose levels (CTDIvol=5.7, 11.4, 17.1, and 22.8 mGy). Regions of interest (ROIs) around each lesion were extracted to generate images with signal-present, with each ROI containing 128×128 pixels. Corresponding ROIs of signal-absent images were generated from images without lesion mimicking rods. The location of the lesion (rod) in each ROI was randomly distributed by moving the ROIs around each lesion. Human observer studies were performed by having three trained observers identify the presence or absence of lesions, indicating the lesion location in each image and scoring confidence for the detection task on a 6-point scale. The same image data were analyzed using a channelized Hotelling model observer (CHO) with Gabor channels. Internal noise was added to the decision variables for the model observer study. Area under the curve (AUC) of ROC and localization ROC (LROC) curves were calculated using a nonparametric approach. The Spearman's rank order correlation between the average performance of the human observers and the model observer performance was calculated for the AUC of both ROC and LROC curves for both the 3- and 5-mm diameter lesions.In both ROC and LROC analyses, AUC values for the model observer agreed well with the average values across the three human observers. The Spearman's rank order correlation values for both ROC and LROC analyses for both the 3- and 5-mm diameter lesions were all 1.0, indicating perfect rank ordering agreement of the figures of merit (AUC) between the average performance of the human observers and the model observer performance.In CT imaging of different sizes of low-contrast lesions (-15 HU), the performance of CHO with Gabor channels was highly correlated with human observer performance for the detection and localization tasks with uncertain lesion location in CT imaging at four clinically relevant dose levels. This suggests the ability of Gabor CHO model observers to meaningfully assess CT image quality for the purpose of optimizing scan protocols and radiation dose levels in detection and localization tasks for low-contrast lesions.
Project description:The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator.
Project description:Previous studies have proposed a simple product-based estimator for calculating exposure-specific risks (ESR), but the methodology has not been rigorously evaluated. The goal of our study was to evaluate the existing methodology for calculating the ESR, propose an improved point estimator, and propose variance estimates that will allow the calculation of confidence intervals (CIs).We conducted a simulation study to test the performance of two estimators and their associated confidence intervals: 1) current (simple product-based estimator) and 2) proposed revision (revised product-based estimator). The first method for ESR estimation was based on multiplying a relative risk (RR) of disease given a certain exposure by an overall risk of disease. The second method, which is proposed in this paper, was based on estimates of the risk of disease in the unexposed. We then multiply the updated risk by the RR to get the revised product-based estimator. A log-based variance was calculated for both estimators. Also, a binomial-based variance was calculated for the revised product-based estimator. 95% CIs were calculated based on these variance estimates. Accuracy of point estimators was evaluated by comparing observed relative bias (percent deviation from the true estimate). Interval estimators were evaluated by coverage probabilities and expected length of the 95% CI, given coverage. We evaluated these estimators across a wide range of exposure probabilities, disease probabilities, relative risks, and sample sizes.We observed more bias and lower coverage probability when using the existing methodology. The revised product-based point estimator exhibited little observed relative bias (max: 4.0%) compared to the simple product-based estimator (max: 93.9%). Because the simple product-based estimator was biased, 95% CIs around this estimate exhibited small coverage probabilities. The 95% CI around the revised product-based estimator from the log-based variance provided better coverage in most situations.The currently accepted simple product-based method was only a reasonable approach when the exposure probability is small (< 0.05) and the RR is ? 3.0. The revised product-based estimator provides much improved accuracy.
Project description:<h4>Background</h4>To assess the agreement of continuous measurements between a number of observers, Jones et al. introduced limits of agreement with the mean (LOAM) for multiple observers, representing how much an individual observer can deviate from the mean measurement of all observers. Besides the graphical visualisation of LOAM, suggested by Jones et al., it is desirable to supply LOAM with confidence intervals and to extend the method to the case of multiple measurements per observer.<h4>Methods</h4>We reformulate LOAM under the assumption the measurements follow an additive two-way random effects model. Assuming this model, we provide estimates and confidence intervals for the proposed LOAM. Further, this approach is easily extended to the case of multiple measurements per observer.<h4>Results</h4>The proposed method is applied on two data sets to illustrate its use. Specifically, we consider agreement between measurements regarding tumour size and aortic diameter. For the latter study, three measurement methods are considered.<h4>Conclusions</h4>The proposed LOAM and the associated confidence intervals are useful for assessing agreement between continuous measurements.
Project description:Knowledge of contagion among economies is a relevant issue in economics. The canonical model of contagion is an alternative in this case. Given the existence of endogenous variables in the model, instrumental variables can be used to decrease the bias of the OLS estimator. In the presence of heteroskedastic disturbances this paper proposes the use of conditional volatilities as instruments. Simulation is used to show that the homoscedastic and heteroskedastic estimators which use them as instruments have small bias. These estimators are preferable in comparison with the OLS estimator and their asymptotic distribution can be used to construct confidence intervals.
Project description:BACKGROUND:Measurement of cognitive behavioural therapy (CBT) competency is often resource intensive. A popular emerging alternative to independent observers' ratings is using other perspectives for rating competency. AIMS:This pilot study compared ratings of CBT competency from four perspectives - patient, therapist, supervisor and independent observer using the Cognitive Therapy Scale (CTS). METHOD:Patients (n = 12, 75% female, mean age 30.5 years) and therapists (n = 5, female, mean age 26.6 years) completed the CTS after therapy sessions, and clinical supervisor and independent observers rated recordings of the same session. RESULTS:Analyses of variance revealed that therapist average CTS competency ratings were not different from supervisor ratings, and supervisor ratings were not different from independent observer ratings; however, therapist ratings were higher than independent observer ratings and patient ratings were higher than all other raters. CONCLUSIONS:Raters differed in competency ratings. Implications for potential use and adaptation of CBT competency measurement methods to enhance training and implementation are discussed.
Project description:This study examines the effect of amantadine on irritability in persons in the post-acute period after traumatic brain injury (TBI). There were 168 persons ?6 months post-TBI with irritability who were enrolled in a parallel-group, randomized, double-blind, placebo-controlled trial receiving either amantadine 100?mg twice daily or equivalent placebo for 60 days. Subjects were assessed at baseline and days 28 (primary end-point) and 60 of treatment using observer-rated and participant-rated Neuropsychiatric Inventory (NPI-I) Most Problematic item (primary outcome), NPI Most Aberrant item, and NPI-I Distress Scores, as well as physician-rated Clinical Global Impressions (CGI) scale. Observer ratings between the two groups were not statistically significantly different at day 28 or 60; however, observers rated the majority in both groups as having improved at both intervals. Participant ratings for day 60 demonstrated improvements in both groups with greater improvement in the amantadine group on NPI-I Most Problematic (p<0.04) and NPI-I Distress (p<0.04). These results were not significant with correction for multiple comparisons. CGI demonstrated greater improvement for amantadine than the placebo group (p<0.04). Adverse event occurrence did not differ between the two groups. While observers in both groups reported large improvements, significant group differences were not found for the primary outcome (observer ratings) at either day 28 or 60. This large placebo or nonspecific effect may have masked detection of a treatment effect. The result of this study of amantadine 100?mg every morning and noon to reduce irritability was not positive from the observer perspective, although there are indications of improvement at day 60 from the perspective of persons with TBI and clinicians that may warrant further investigation.
Project description:Efficient optimization of CT protocols demands a quantitative approach to predicting human observer performance on specific tasks at various scan and reconstruction settings. The goal of this work was to investigate how well a channelized Hotelling observer (CHO) can predict human observer performance on 2-alternative forced choice (2AFC) lesion-detection tasks at various dose levels and two different reconstruction algorithms: a filtered-backprojection (FBP) and an iterative reconstruction (IR) method.A 35 × 26 cm(2) torso-shaped phantom filled with water was used to simulate an average-sized patient. Three rods with different diameters (small: 3 mm; medium: 5 mm; large: 9 mm) were placed in the center region of the phantom to simulate small, medium, and large lesions. The contrast relative to background was -15 HU at 120 kV. The phantom was scanned 100 times using automatic exposure control each at 60, 120, 240, 360, and 480 quality reference mAs on a 128-slice scanner. After removing the three rods, the water phantom was again scanned 100 times to provide signal-absent background images at the exact same locations. By extracting regions of interest around the three rods and on the signal-absent images, the authors generated 21 2AFC studies. Each 2AFC study had 100 trials, with each trial consisting of a signal-present image and a signal-absent image side-by-side in randomized order. In total, 2100 trials were presented to both the model and human observers. Four medical physicists acted as human observers. For the model observer, the authors used a CHO with Gabor channels, which involves six channel passbands, five orientations, and two phases, leading to a total of 60 channels. The performance predicted by the CHO was compared with that obtained by four medical physicists at each 2AFC study.The human and model observers were highly correlated at each dose level for each lesion size for both FBP and IR. The Pearson's product-moment correlation coefficients were 0.986 [95% confidence interval (CI): 0.958-0.996] for FBP and 0.985 (95% CI: 0.863-0.998) for IR. Bland-Altman plots showed excellent agreement for all dose levels and lesions sizes with a mean absolute difference of 1.0% ± 1.1% for FBP and 2.1% ± 3.3% for IR.Human observer performance on a 2AFC lesion detection task in CT with a uniform background can be accurately predicted by a CHO model observer at different radiation dose levels and for both FBP and IR methods.
Project description:In cardiac computed tomography (CT), important clinical indices, such as the coronary calcium score and the percentage of coronary artery stenosis, are often adversely affected by motion artifacts. As a result, the expert observer must decide whether or not to use these indices during image interpretation. Computerized methods potentially can be used to assist in these decisions. In a previous study, an artificial neural network (ANN) regression model provided assessability (image quality) indices of calcified plaque images from the software NCAT phantom that were highly agreeable with those provided by expert observers. The method predicted assessability indices based on computer-extracted features of the plaque. In the current study, the ANN-predicted assessability indices were used to identify calcified plaque images with diagnostic calcium scores (based on mass) from a physical dynamic cardiac phantom. The basic assumption was that better quality images were associated with more accurate calcium scores.A 64-channel CT scanner was used to obtain 500 calcified plaque images from a physical dynamic cardiac phantom at different heart rates, cardiac phases, and plaque locations. Two expert observers independently provided separate sets of assessability indices for each of these images. Separate sets of ANN-predicted assessability indices tailored to each observer were then generated within the framework of a bootstrap resampling scheme. For each resampling iteration, the absolute calcium score error between the calcium scores of the motion-contaminated plaque image and its corresponding stationary image served as the ground truth in terms of indicating images with diagnostic calcium scores. The performances of the ANN-predicted and observer-assigned indices in identifying images with diagnostic calcium scores were then evaluated using ROC analysis.Assessability indices provided by the first observer and the corresponding ANN performed similarly (AUC(OBS1) = 0.80 [0.73, 0.86] vs AUC(ANN1) = 0.88 [0.82, 0.92]) as that of the second observer and the corresponding ANN (AUC(OBS2) = 0.87 [0.83,0.91] vs. AUC(ANN2) = 0.90 [0.85, 0.94]). Moreover, the ANN-predicted indices were generated in a fraction of the time required to obtain the observer-assigned indices.ANN-predicted assessability indices performed similar to observer-assigned assessability indices in identifying images with diagnostic calcium scores from the physical dynamic cardiac phantom. The results of this study demonstrate the potential of using computerized methods for identifying images with diagnostic clinical indices in cardiac CT images.
Project description:Truncation is a well-known phenomenon that may be present in observational studies of time-to-event data. While many methods exist to adjust for either left or right truncation, there are very few methods that adjust for simultaneous left and right truncation, also known as double truncation. We propose a Cox regression model to adjust for this double truncation using a weighted estimating equation approach, where the weights are estimated from the data both parametrically and nonparametrically, and are inversely proportional to the probability that a subject is observed. The resulting weighted estimators of the hazard ratio are consistent. The parametric weighted estimator is asymptotically normal and a consistent estimator of the asymptotic variance is provided. For the nonparametric weighted estimator, we apply the bootstrap technique to estimate the variance and confidence intervals. We demonstrate through extensive simulations that the proposed estimators greatly reduce the bias compared to the unweighted Cox regression estimator which ignores truncation. We illustrate our approach in an analysis of autopsy-confirmed Alzheimer's disease patients to assess the effect of education on survival.