Computerized Adaptive Testing Provides Reliable and Efficient Depression Measurement Using the CES-D Scale.
ABSTRACT: BACKGROUND:The Center for Epidemiologic Studies Depression Scale (CES-D) is a measure of depressive symptomatology which is widely used internationally. Though previous attempts were made to shorten the CES-D scale, few have attempted to develop a Computerized Adaptive Test (CAT) version for the CES-D. OBJECTIVE:The aim of this study was to provide evidence on the efficiency and accuracy of the CES-D when administered using CAT using an American sample group. METHODS:We obtained a sample of 2060 responses to the CESD-D from US participants using the myPersonality application. The average age of participants was 26 years (range 19-77). We randomly split the sample into two groups to evaluate and validate the psychometric models. We used evaluation group data (n=1018) to assess dimensionality with both confirmatory factor and Mokken analysis. We conducted further psychometric assessments using item response theory (IRT), including assessments of item and scale fit to Samejima's graded response model (GRM), local dependency and differential item functioning. We subsequently conducted two CAT simulations to evaluate the CES-D CAT using the validation group (n=1042). RESULTS:Initial CFA results indicated a poor fit to the model and Mokken analysis revealed 3 items which did not conform to the same dimension as the rest of the items. We removed the 3 items and fit the remaining 17 items to GRM. We found no evidence of differential item functioning (DIF) between age and gender groups. Estimates of the level of CES-D trait score provided by the simulated CAT algorithm and the original CES-D trait score derived from original scale were correlated highly. The second CAT simulation conducted using real participant data demonstrated higher precision at the higher levels of depression spectrum. CONCLUSIONS:Depression assessments using the CES-D CAT can be more accurate and efficient than those made using the fixed-length assessment.
Project description:BACKGROUND:The Dutch-Flemish PROMIS® Upper Extremity (DF-PROMIS-UE) V2.0 item bank was recently developed using Item Response Theory (IRT). Unknown for this bank are: (1) if it is legitimate to calculate IRT-based scores for short forms and Computerized Adaptive Tests (CATs), which requires that the items meet the assumptions of and fit the IRT-model (Graded Response Model [GRM]);(2) if it is legitimate to compare (sub) groups of patients using this measure, which requires measurement invariance; and (3) the precision of the estimated patients' scores for patients with different levels of functioning and compared to legacy measures. Aims were to evaluate (1) the assumptions of and fit to the GRM, (2) measurement invariance and (3) (comparative) precision of the DF-PROMIS-UE v2.0. METHODS:Cross-sectional data were collected in Dutch patients with upper extremity disorders. Assessed were IRT-assumptions (unidimensionality [bi-factor analysis], local independence [residual correlations], monotonicity [coefficient H]), GRM item fit, measurement invariance (absence of Differential Item Functioning [DIF] due to age, gender, center, duration, and location of complaints) and precision (standard error of IRT-based scores across levels of functioning). To study measurement invariance for language [Dutch vs. English], additional US data were used. Legacy instruments were the Disability of the Arm, Shoulder and Hand (DASH), the QuickDASH and the Michigan Hand Questionnaire (MHQ). RESULTS:In total 521 Dutch (mean age?±?SD?=?51?±?17?years, 49% female) and 246 US patients (mean age?±?SD?=?48?±?14?years, 69% female) participated. The DF-PROMIS-UE v2.0 item bank was sufficiently unidimensional (Omega-H?=?0.80, Explained Common Variance?=?0.68), had negligible local dependence (four out of 1035 correlations >?0.20), good monotonicity (H?=?0.63), good GRM fit (no misfitting items) and demonstrated sufficient measurement invariance. Precise estimates (Standard Error?<?3.2) were obtained for most patients (7-item short form, 88.5%; standard CAT, 91.3%; and, fixed 7-item CAT, 87.6%). The DASH displayed better reliability than the DF-PROMIS-UE short form and standard CAT, the QuickDASH displayed comparable reliability. The MHQ-ADL displayed better reliability than the DF-PROMIS-UE short form and standard CAT for T-scores between 28 and 50. For patients with low function, the DF-PROMIS-UE measures performed better. CONCLUSIONS:The DF-PROMIS-UE v2.0 item bank showed sufficient psychometric properties in Dutch patients with UE disorders.
Project description:<h4>Background</h4>To develop a web-based computer adaptive testing (CAT) application for efficiently collecting data regarding workers' perceptions of job satisfaction, we examined whether a 37-item Job Content Questionnaire (JCQ-37) could evaluate the job satisfaction of individual employees as a single construct.<h4>Methods</h4>The JCQ-37 makes data collection via CAT on the internet easy, viable and fast. A Rasch rating scale model was applied to analyze data from 300 randomly selected hospital employees who participated in job-satisfaction surveys in 2008 and 2009 via non-adaptive and computer-adaptive testing, respectively.<h4>Results</h4>Of the 37 items on the questionnaire, 24 items fit the model fairly well. Person-separation reliability for the 2008 surveys was 0.88. Measures from both years and item-8 job satisfaction for groups were successfully evaluated through item-by-item analyses by using t-test. Workers aged 26 - 35 felt that job satisfaction was significantly worse in 2009 than in 2008.<h4>Conclusions</h4>A Web-CAT developed in the present paper was shown to be more efficient than traditional computer-based or pen-and-paper assessments at collecting data regarding workers' perceptions of job content.
Project description:Background:The stigma associated with neurologic disorders plays a part in poor health-related quality of life. The eight-item Stigma Scale for Chronic Illness (SSCI-8) is a brief self-assessment tool for measuring perceived level of stigma. The psychometric performance of the SSCI-8 in people with multiple sclerosis (MS) was assessed. Methods:A multicenter, cross-sectional study in adults with relapsing-remitting or primary progressive MS was performed. A nonparametric item response theory procedure, Mokken analysis, was done to preliminarily study the dimensional structure of the SSCI-8. A confirmatory factor analysis (CFA) model was then fit, and the behavior and information covered by the eight items were assessed by parametric item response theory analysis. Results:A total of 201 patients (mean ± SD age, 43.9 ± 10.5 years; 60.2% female; 86.1% with relapsing-remitting MS) were studied. The Mokken analysis found that the SSCI-8 is a unidimensional strong scale (scalability index H = 0.56) with high reliability (Cronbach ? = 0.88). The CFA model confirmed the unidimensionality (comparative fit index = 0.975, root mean square error of approximation = 0.077). The information covered by the SSCI-8 items ranges from 3.79 to 13.52, for a total of 66.56. More than half (66%) of the SSCI-8 overall information is conveyed by four items: 1 ("Some people avoided me"), 2 ("I felt left out of things"), 3 ("People avoided looking at me"), and 7 ("People were unkind to me"). Conclusions:The SSCI-8 shows appropriate psychometric characteristics and is, therefore, a useful instrument for assessing stigma in people with MS.
Project description:BACKGROUND:This study evaluated the psychometric properties of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) instrument administered through computerized adaptive testing (CAT) compared with the traditional full-length Disabilities of the Arm, Shoulder and Hand (DASH). METHODS:The PROMIS PF CAT and the DASH were administered to 1759 patients seeking care for elbow conditions. This study used Rasch partial credit modeling to analyze the instruments with item fit, internal reliability, response category thresholds, dimensionality, local independence, gender differential item functioning, and floor and ceiling effects. RESULTS:The PROMIS PF CAT and DASH had satisfactory item fit for all but 1 item on both measures. Internal reliabilities were high for both measures. Two items on the DASH and 4 items on the PF CAT showed nonordered category thresholds. Unidimensionality was adequate, and local independence was supported for both instruments. Gender bias was found for 4 items on the PF CAT and 12 items on the DASH. Both measures had adequate instrument targeting and satisfactory floor and ceiling effects. CONCLUSION:The PROMIS PF CAT and the DASH both showed sufficient unidimensionality, good item fit, and good local independence with the exception of high levels of gender item bias, particularly for the DASH. Further scale evaluation should address item bias and item response categories for these instruments. Overall, the PROMIS PF CAT is an effective outcome instrument to measure function in patients with elbow disorders that requires significantly fewer questions than the DASH.
Project description:BACKGROUND:As the worldwide prevalence of chronic illness increases so too does the demand for novel treatments to improve chronic illness care. Quantifying improvement in chronic illness care from the patient perspective relies on the use of validated patient-reported outcome measures. In this analysis we examine the psychometric and scaling properties of the Patient Assessment of Chronic Illness Care (PACIC) questionnaire for use in the United Kingdom by applying scale data to the non-parametric Mokken double monotonicity model. METHODS:Data from 1849 patients with long-term conditions in the UK who completed the 20-item PACIC were analysed using Mokken analysis. A three-stage analysis examined the questionnaire's scalability, monotonicity and item ordering. An automated item selection procedure was used to assess the factor structure of the scale. Analysis was conducted in an 'evaluation' dataset (n?=?956) and results were confirmed using an independent 'validation' (n?=?890) dataset. RESULTS:Automated item selection procedures suggested that the 20 items represented a single underlying trait representing "patient assessment of chronic illness care": this contrasts with the multiple domains originally proposed. Six items violated invariant item ordering and were removed. The final 13-item scale had no further issues in either the evaluation or validation samples, including excellent scalability (Ho?=?.50) and reliability (Rho?=?.88). CONCLUSIONS:Following some modification, the 13-items of the PACIC were successfully fitted to the non-parametric Mokken model. These items have psychometrically robust and produce a single ordinal summary score. This score will be useful for clinicians or researchers to assess the quality of chronic illness care from the patient's perspective.
Project description:OBJECTIVE:The Tinnitus Handicap Inventory (THI) is widely used in clinical practice and research as a three-dimensional measure of tinnitus severity. Despite extensive use, its factor structure remains unclear. Furthermore, THI can be considered a reliable measure only if Cronbach's alpha coefficient and Classical Test Theory is used. The more modern and robust Item Response Theory (IRT) has so far not been used to psychometrically evaluate THI. In theory, IRT allows a more precise evaluation of THI's factor structure, reliability, and the quality of individual items. METHOD:There were 1115 patients with tinnitus (556 women and 559 men), aged 19-84 years (M = 51.55; SD = 13.28). The dimensionality of THI was evaluated using several models of Confirmatory Factor Analysis and an Item Response Theory approach. Exploratory non-parametric Mokken scaling was applied to determine a unidimensional and robust scale. Several IRT polytomous models were used to assess the overall quality of THI. RESULTS:The bifactor model had the best fit (RMSEA = 0.055; CFI = 0.976; SRMR = 0.040) and revealed one strong general factor and several weak specific factors. Mokken scaling generated a reliable unidimensional scale (Loevinger's H = 0.463). In order to refine THI we propose that five items be removed. The IRT Generalized Partial Credit Model generated good parameters in terms of item location (difficulty), discrimination, and information content of items. CONCLUSION:Our findings support the use of THI to evaluate tinnitus severity in terms of it being a reliable unidimensional scale. However, clinicians and researchers should rely only on its overall score, which reflects global tinnitus severity. To improve its psychometric quality, several refinements of THI are proposed.
Project description:OBJECTIVES:The assessment of health care professionals' attitudes and beliefs towards musculoskeletal pain is essential because they are key determinants of their clinical practice behaviour. The Pain Attitudes and Beliefs Scale (PABS) biomedical scale evaluates the degree of health professionals' biomedical orientation towards musculoskeletal pain and was never assessed using item response theory (IRT). This study aimed at assessing the psychometric performance of the 10-item biomedical scale of the PABS scale using IRT. METHODS:Two cross-sectional samples (BeBack, n = 1016; DABS; n = 958) of health care professionals working in the UK were analysed. Mokken scale analysis (nonparametric IRT) and common factor analysis were used to assess dimensionality of the instrument. Parametric IRT was used to assess model fit, item parameters, and local reliability (measurement precision). RESULTS:Results were largely similar in the two samples and the scale was found to be unidimensional. The graded response model showed adequate fit, covering a broad range of the measured construct in terms of item difficulty. Item 3 showed some misfit but only in the DABS sample. Some items (i.e. 7, 8 and 9) displayed remarkably higher discrimination parameters than others (4, 5 and 10). The scale showed satisfactory measurement precision (reliability > 0.70) between theta values -2 and +3. DISCUSSION:The 10-item biomedical scale of the PABS displayed adequate psychometric performance in two large samples of health care professionals, and it is suggested to assess group-level professionals degree of biomedical orientation towards musculoskeletal pain.
Project description:International and national health policy seeks to increase service user and carer involvement in mental health care planning, but suitable user-centred tools to assess the success of these initiatives are not yet available. The current study describes the development of a new reliable and valid, interval-scaled service-user and carer reported outcome measure for quantifying user/carer involvement in mental health care planning. Psychometric development reduced a 70-item item bank to a short form questionnaire using a combination of Classical Test, Mokken and Rasch Analyses. Test-retest reliability was calculated using t-tests of interval level scores between baseline and 2-4 week follow-up. Items were worded to be relevant to both service users and carers. Nine items were removed following cognitive debriefing with a service user and carer advisory group. An iterative process of item removal reduced the remaining 61 items to a final 14-item scale. The final scale has acceptable scalability (Ho = .69), reliability (alpha = .92), fit to the Rasch model (?2(70) = 97.25, p = .02), and no differential item functioning or locally dependent items. Scores remained stable over the 4 week follow-up period, indicating good test-retest reliability. The 'Evaluating the Quality of User and Carer Involvement in Care Planning (EQUIP)' scale displays excellent psychometric properties and is capable of unidimensional linear measurement. The scale is short, user and carer-centred and will be of direct benefit to clinicians, services, auditors and researchers wishing to quantify levels of user and carer involvement in care planning.
Project description:Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations.Seventy-two items generated from patient interviews and mapped to the International Classification of Functioning, Disability and Health (ICF) domestic life chapter were administered to 1128 adults representative of the Dutch population. The partial credit model was fitted to the item responses and evaluated with respect to its assumptions, model fit, and differential item functioning (DIF). Measurement performance of a computerized adaptive testing (CAT) algorithm was compared with the SF-36 physical functioning scale (PF-10).A final bank of 41 items was developed. All items demonstrated acceptable fit to the partial credit model and measurement invariance across age, sex, and educational level. Five- and ten-item CAT simulations were shown to have high measurement precision, which exceeded that of SF-36 physical functioning scale across the physical function continuum. Floor effects were absent for a 10-item empirical CAT simulation, and ceiling effects were low (13.5%) compared with SF-36 physical functioning (38.1%). CAT also discriminated better than SF-36 physical functioning between age groups, number of chronic conditions, and respondents with or without rheumatic conditions.The Rasch assessment of everyday activity limitations (REAL) item bank will hopefully prove a useful instrument for assessing everyday activity limitations. T-scores obtained using derived measures can be used to benchmark physical function outcomes against the general Dutch adult population.
Project description:BACKGROUND:People living with serious mental health conditions experience increased morbidity due to physical health issues driven by medication side-effects and lifestyle factors. Coordinated mental and physical healthcare delivered in accordance with a care plan could help to reduce morbidity and mortality in this population. Efforts to develop new models of care are hampered by a lack of validated instruments to accurately assess the extent to which mental health services users and carers are involved in care planning for physical health. OBJECTIVE:To develop a brief and accurate patient-reported experience measure (PREM) capable of assessing involvement in physical health care planning for mental health service users and their carers. METHODS:We employed psychometric and statistical techniques to refine a bank of candidate questionnaire items, derived from qualitative interviews, into a valid and reliable measure involvement in physical health care planning. We assessed the psychometric performance of the item bank using modern psychometric analyses. We assessed unidimensionality, scalability, fit to the partial credit Rasch model, category threshold ordering, local dependency, differential item functioning, and test-retest reliability. Once purified of poorly performing and erroneous items, we simulated computerized adaptive testing (CAT) with 15, 10 and 5 items using the calibrated item bank. RESULTS:Issues with category threshold ordering, local dependency and differential item functioning were evident for a number of items in the nascent item bank and were resolved by removing problematic items. The final 19 item PREM had excellent fit to the Rasch model fit (x2 = 192.94, df = 1515, P = .02, RMSEA = .03 (95% CI = .01-.04). The 19-item bank had excellent reliability (marginal r = 0.87). The correlation between questionnaire scores at baseline and 2-week follow-up was high (r = .70, P < .01) and 94.9% of assessment pairs were within the Bland Altman limits of agreement. Simulated CAT demonstrated that assessments could be made using as few as 10 items (mean SE = .43). DISCUSSION:We developed a flexible patient reported outcome measure to quantify service user and carer involvement in physical health care planning. We demonstrate the potential to substantially reduce assessment length whilst maintaining reliability by utilizing CAT.