Psychometric Properties and Performance of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Depression Short Forms in Ethnically Diverse Groups.
ABSTRACT: Short form measures from the Patient Reported Outcomes Measurement Information System® (PROMIS®) are used widely. The present study was among the first to examine differential item functioning (DIF) in the PROMIS Depression short form scales in a sample of over 5000 racially/ethnically diverse patients with cancer. DIF analyses were conducted across different racial/ethnic, educational, age, gender and language groups. METHODS:DIF hypotheses, generated by content experts, informed the evaluation of the DIF analyses. The graded item response theory (IRT) model was used to evaluate the five-level ordinal items. The primary tests of DIF were Wald tests; sensitivity analyses were conducted using the IRT ordinal logistic regression procedure. Magnitude was evaluated using expected item score functions, and the non-compensatory differential item functioning (NCDIF) and T1 indexes, both based on group differences in the item curves. Aggregate impact was evaluated with expected scale score (test) response functions; individual impact was assessed through examination of differences in DIF adjusted and unadjusted depression estimates. RESULTS:Many items evidenced DIF; however, only a few had slightly elevated magnitude. No items evidenced salient DIF with respect to NCDIF and the scale-level impact was minimal for all group comparisons. The following short form items might be targeted for further study because they were also hypothesized to evidence DIF. One item showed slightly higher magnitude of DIF for age: nothing to look forward to; conditional on depression, this item was more likely to be endorsed in the depressed direction by individuals in older groups as contrasted with the cohort aged 21 to 49. This item was also hypothesized to show age DIF. Only one item (failure) showed DIF of slightly higher magnitude (just above threshold) for Whites vs. Asians/Pacific Islanders in the direction of higher likelihood of endorsement for Asians/Pacific Islanders. This item was also hypothesized to show DIF for minority groups. The impact of DIF was negligible. Conditional on depression, the items, worthless and hopeless were more likely to be endorsed in the depressed direction by respondents with less than high school education vs. those with a graduate degree; the magnitude of DIF was slightly above the T1 threshold, but not that of NCDIF. These items were also hypothesized to show DIF in the direction of more feelings of worthlessness by groups with lower education. While the magnitude and aggregate impact of DIF was small, in a few instances, individual impact was observed. Information provided was relatively high, particularly in the middle upper (depressed) tail of the distribution. Reliability estimates were high (> 0.90) across all studied groups, regardless of estimation method. CONCLUSIONS:This was the first study to evaluate measurement equivalence of the PROMIS Depression short forms across large samples of ethnically diverse groups. There were few items with DIF, and none of high magnitude, thus supporting the use of PROMIS Depression short form measures across such groups. These results could be informative for those using the short forms in minority populations or clinicians evaluating individuals with the depression short forms.
Project description:This is the first study of the measurement equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Anxiety short forms in a large ethnically diverse sample. The psychometric properties and differential item functioning (DIF) were examined across different racial/ethnic, educational, age, gender and language groups. METHODS:These data are from individuals selected from cancer registries in the United States. For the analyses of race/ethnicity the reference group was non-Hispanic Whites (n = 2,263), the studied groups were non-Hispanic Blacks (n = 1,117), Hispanics (n = 1,043) and Asians/Pacific Islanders (n = 907). Within the Hispanic subsample, there were 335 interviews conducted in Spanish and 703 in English. The 11 anxiety items were from the PROMIS emotional disturbance item bank. DIF hypotheses were generated by content experts who rated whether or not they expected DIF to be present, and the direction of the DIF with respect to several comparison groups. The primary method used for DIF detection was the Wald test for examination of group differences in item response theory (IRT) item parameters accompanied by magnitude measures. Expected item scores were examined as measures of magnitude. The method used for quantification of the difference in the average expected item scores was the non-compensatory DIF (NCDIF) index. DIF impact was examined using expected scale score functions. Additionally, precision and reliabilities were examined using several methods. RESULTS:Although not hypothesized to show DIF for Asians/Pacific Islanders, every item evidenced DIF by at least one method. Two items showed DIF of higher magnitude for Asians/Pacific Islanders vs. Whites: "Many situations made me worry" and "I felt anxious". However, the magnitude of DIF was small and the NCDIF statistics were not above threshold. The impact of DIF was negligible. For education, six items were identified with consistent DIF across methods: fearful, anxious, worried, hard to focus, uneasy and tense. However, the NCDIF was not above threshold and the impact of DIF on the scale was trivial. No items showed high magnitude DIF for gender. Two items showed slightly higher magnitude for age (although not above the cutoff): worried and fearful. The scale level impact was trivial. Only one item showed DIF with the Wald test after the Bonferroni correction for the language comparisons: "I felt fearful". Two additional items were flagged in sensitivity analyses after Bonferroni correction, anxious and many situations made me worry. The latter item also showed DIF of higher magnitude, with an NCDIF value (0.144) above threshold. Individual impact was relatively small. CONCLUSIONS:Although many items from the PROMIS short form anxiety measures were flagged with DIF, item level magnitude was low and scale level DIF impact was minimal; however, three items: anxious, worried and many situations made me worry might be singled out for further study. It is concluded that the PROMIS Anxiety short form evidenced good psychometric properties, was relatively invariant across the groups studied, and performed well among ethnically diverse subgroups of Blacks, Hispanic, White non-Hispanic and Asians/Pacific Islanders. In general more research with the Asians/Pacific Islanders group is needed. Further study of subgroups within these broad categories is recommended.
Project description:AIMS:The goals of these analyses were to examine the psychometric properties and measurement equivalence of a self-reported cognition measure, the Patient Reported Outcome Measurement Information System® (PROMIS®) Applied Cognition - General Concerns short form. These items are also found in the PROMIS Cognitive Function (version 2) item bank. This scale consists of eight items related to subjective cognitive concerns. Differential item functioning (DIF) analyses of gender, education, race, age, and (Spanish) language were performed using an ethnically diverse sample (n = 5,477) of individuals with cancer. This is the first analysis examining DIF in this item set across ethnic and racial groups. METHODS:DIF hypotheses were derived by asking content experts to indicate whether they posited DIF for each item and to specify the direction. The principal DIF analytic model was item response theory (IRT) using the graded response model for polytomous data, with accompanying Wald tests and measures of magnitude. Sensitivity analyses were conducted using ordinal logistic regression (OLR) with a latent conditioning variable. IRT-based reliability, precision and information indices were estimated. RESULTS:DIF was identified consistently only for the item, brain not working as well as usual. After correction for multiple comparisons, this item showed significant DIF for both the primary and sensitivity analyses. Black respondents and Hispanics in comparison to White non-Hispanic respondents evidenced a lower conditional probability of endorsing the item, brain not working as well as usual. The same pattern was observed for the education grouping variable: as compared to those with a graduate degree, conditioning on overall level of subjective cognitive concerns, those with less than high school education also had a lower probability of endorsing this item. DIF was also observed for age for two items after correction for multiple comparisons for both the IRT and OLR-based models: "I have had to work really hard to pay attention or I would make a mistake" and "I have had trouble shifting back and forth between different activities that require thinking". For both items, conditional on cognitive complaints, older respondents had a higher likelihood than younger respondents of endorsing the item in the cognitive complaints direction. The magnitude and impact of DIF was minimal. The scale showed high precision along much of the subjective cognitive concerns continuum; the overall IRT-based reliability estimate for the total sample was 0.88 and the estimates for subgroups ranged from 0.87 to 0.92. CONCLUSION:Little DIF of high magnitude or impact was observed in the PROMIS Applied Cognition - General Concerns short form item set. One item, "It has seemed like my brain was not working as well as usual" might be singled out for further study. However, in general the short form item set was highly reliable, informative, and invariant across differing race/ethnic, educational, age, gender, and language groups.
Project description:To investigate differential item functioning (DIF) of PROMIS Depression items between US and German samples we compared data from the US PROMIS calibration sample (n = 780), a German general population survey (n = 2,500) and a German clinical sample (n = 621). DIF was assessed in an ordinal logistic regression framework, with 0.02 as criterion for R2 -change and 0.096 for Raju's non-compensatory DIF. Item parameters were initially fixed to the PROMIS Depression metric; we used plausible values to account for uncertainty in depression estimates. Only four items showed DIF. Accounting for DIF led to negligible effects for the full item bank as well as a post hoc simulated computer-adaptive test (< 0.1 point on the PROMIS metric [mean = 50, standard deviation =10]), while the effect on the short forms was small (< 1 point). The mean depression severity (43.6) in the German general population sample was considerably lower compared to the US reference value of 50. Overall, we found little evidence for language DIF between US and German samples, which could be addressed by either replacing the DIF items by items not showing DIF or by scoring the short form in German samples with the corrected item parameters reported.
Project description:The present study examined the Patient Reported Outcomes Measurement Information System (PROMIS) Mobility, Fatigue, and Pain Interference Short Forms (SFs) in children and adolescents with cerebral palsy (CP) for the presence of differential item functioning (DIF) relative to the original calibration sample.Using the Graded Response Model we compared item parameter estimates generated from a sample of 303 children and adolescents with CP (175 males, 128 females; mean age 15y 5mo) to parameter estimates from the PROMIS calibration sample, which served as the reference group. DIF was assessed in a two-step process using the item response theory-likelihood ratio-differential item functioning detection procedure.Significant DIF was identified for four of eight items in the PROMIS Mobility SF, for two of eight items in the Pain Interference Scale, and for one item out of 10 on the Fatigue Scale. Impact of DIF on total score estimation was notable for Mobility and Pain Interference, but not for Fatigue.Results suggest differences in the responses of adolescents with CP to some items on the PROMIS Mobility and Pain Interference SFs. Cognitive interviews about the PROMIS items with adolescents with varying degrees of mobility limitations would provide better understanding of how they are interpreting and selecting responses to the PROMIS items and thus help guide selection of the most appropriate way to address this issue.
Project description:The purpose of this article is to introduce the methods used and challenges confronted by the authors of this two-part series of articles describing the results of analyses of measurement equivalence of the short form scales from the Patient Reported Outcomes Measurement Information System® (PROMIS®). Qualitative and quantitative approaches used to examine differential item functioning (DIF) are reviewed briefly. Qualitative methods focused on generation of DIF hypotheses. The basic quantitative approaches used all rely on a latent variable model, and examine parameters either derived directly from item response theory (IRT) or from structural equation models (SEM). A key methods focus of these articles is to describe state-of-the art approaches to examination of measurement equivalence in eight domains: physical health, pain, fatigue, sleep, depression, anxiety, cognition, and social function. These articles represent the first time that DIF has been examined systematically in the PROMIS short form measures, particularly among ethnically diverse groups. This is also the first set of analyses to examine the performance of PROMIS short forms in patients with cancer. Latent variable model state-of-the-art methods for examining measurement equivalence are introduced briefly in this paper to orient readers to the approaches adopted in this set of papers. Several methodological challenges underlying (DIF-free) anchor item selection and model assumption violations are presented as a backdrop for the articles in this two-part series on measurement equivalence of PROMIS measures.
Project description:Objective:To provide psychometric evaluation of the PROMIS® Pediatric Psychological and Physical Stress Experiences measures. Methods:Across two studies, Psychological and Physical Stress Experiences items were administered to 2,875 children aged 8-17 years and 2,212 parents of children aged 5-17 years. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning (DIF), and assessment of construct validity. Items were calibrated using item response theory to estimate item parameters representative of the United States. Recommended eight- and four-item short forms were constructed for child- and parent-report versions of the Psychological and Physical Stress Experiences item banks. Results:Final item banks were unidimensional and items were locally independent and free from impactful DIF. Psychological Stress banks include 19 child-report and 12 parent-proxy items. Physical Stress banks include 26 child-report and 26 parent-proxy items. All instruments have strong internal consistency and retest-reliability, and provide precise estimates of varying stress levels. The instruments' construct validity was evidenced by known-group comparisons and convergence with legacy measures. Conclusions:The Patient Reported Outcome Measurement Information System (PROMIS) Pediatric Psychological and Physical Stress item banks and short forms provide efficient, precise, and valid assessments of children's stress experiences.
Project description:The Patient-Reported Outcomes Measurement Information System (PROMIS) Smoking Initiative has developed item banks for assessing six smoking behaviors and biopsychosocial correlates of smoking among adult cigarette smokers. The goal of this study is to evaluate the performance of the Spanish version of the PROMIS smoking item banks as compared to the original banks developed in English.The six PROMIS banks for daily smokers were translated into Spanish and administered to a sample of Spanish-speaking adult daily smokers in the United States (N = 302). We first evaluated the unidimensionality of each bank using confirmatory factor analysis. We then conducted a two-group item response theory calibration, including an item response theory-based Differential Item Functioning (DIF) analysis by language of administration (Spanish vs. English). Finally, we generated full bank and short form scores for the translated banks and evaluated their psychometric performance.Unidimensionality of the Spanish smoking item banks was supported by confirmatory factor analysis results. Out of a total of 109 items that were evaluated for language DIF, seven items in three of the six banks were identified as having levels of DIF that exceeded an established criterion. The psychometric performance of the Spanish daily smoker banks is largely comparable to that of the English versions.The Spanish PROMIS smoking item banks are highly similar, but not entirely equivalent, to the original English versions. The parameters from these two-group calibrations can be used to generate comparable bank scores across the two language versions.In this study, we developed a Spanish version of the PROMIS smoking toolkit, which was originally designed and developed for English speakers. With the growing Spanish-speaking population, it is important to make the toolkit more accessible by translating the items and calibrating the Spanish version to be comparable with English-language scores. This study provided the translated item banks and short forms, comparable unbiased scores for Spanish speakers and evaluations of the psychometric properties of the new Spanish toolkit.
Project description:PURPOSE:The Patient-Reported Outcomes Measurement Information System 29-item profile (PROMIS-29 v2.0), which measures health-related quality of life (HRQoL), has had limited evaluation among older adults (age 65+) with multiple chronic conditions. Our purpose was to establish convergent validity for PROMIS-29 in this population. METHODS:We collected the PROMIS-29 v2.0 and the Veterans RAND 36 (VR-36) for 1359 primary care patients aged 65?+?with at least 2 of 13 chronic conditions, oversampling those aged 80+. We conducted multiple analyses to examine score differences across subgroups, differential item functioning (DIF), and comparisons of PROMIS-29 v2.0 and VR-36 scores. RESULTS:The mean age was 80.7, and all patients had at least 2 of 13 chronic conditions. Older age, female sex, Hispanic ethnicity, and more chronic conditions were associated with worse physical health scores (PHS) and mental health scores (MHS) on the PROMIS-29 v2.0-findings which are in the expected direction. None of the 700 pairs of items met criteria for DIF. PHS and MHS were highly intercorrelated (r?=?0.74, p?<?0.001 for this and all other findings). PHS was more highly correlated with the VR-36 Physical Component Score (PCS) than the Mental Component Score (MCS) (r?=?0.85 and 0.32, respectively), while MHS was highly correlated with both (r?=?0.70 and 0.64, respectively). CONCLUSIONS:PROMIS-29 v2.0 demonstrates expected bivariate relationships with key person-level characteristics and does not show DIF. PROMIS-29 v2.0 scores are highly correlated with VR-36 scores. These results provide support for the validity of PROMIS-29 v2.0 as a measure of HRQoL among older adults with multiple chronic conditions.
Project description:Fibromyalgia (FM) is characterized by myriad symptoms and problems. Fatigue is one of the most common, distressing, and disabling symptoms in FM. The purpose of this study was to use fatigue item banks that were developed as part of the Patient-Reported Outcomes Measurement Information System (PROMIS) to devise a self-report measure of fatigue for use in individuals with FM.A sample of 532 adults with FM (age range = 18-77, 96.1 % female) completed the PROMIS fatigue item bank. Factor analyses and item response theory analyses were used to identify dimensionality and optimally performing items. These data were used in combination with clinical input to select items for a fatigue self-report measure for use in FM.Factor analyses revealed four distinct factors in the PROMIS fatigue item bank; items for each univariate subscale were identified by selecting four items with high item information values. A 16-item measure, the PROMIS FatigueFM Profile, consisting of four 4-item short forms reflecting fatigue experience ("intensity") and fatigue impact in three subdomains-social, cognitive, and motivation-was created. The new PROMIS FatigueFM Profile short forms showed excellent internal reliability, low ceiling and floor effects, and equivalent or higher test information compared to the standard 4- and 7-item PROMIS fatigue short forms.The newly developed PROMIS FatigueFM Profile, a 16-item measure consisting of four 4-item short forms of self-reported fatigue severity, shows early evidence of good psychometric characteristics, provides the ability to use short forms that assess distinct aspects of fatigue experience and fatigue impact, and demonstrates equivalent or higher levels of test information compared to standard PROMIS fatigue short forms with similar number of items. The PROMIS FatigueFM Profile indicated fatigue experience and impact levels approximately 1.5 standard deviations above the normative sample mean across all short forms. Future work to evaluate the validity and reliability of this new measure in individuals with FM is needed.
Project description:<h4>Background</h4>Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36), can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF), which arises when population sub-groups with the same underlying (i.e., latent) level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF) and mental health (MH) sub-scale items in a Canadian population-based sample.<h4>Methods</h4>Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos), which collected baseline data in 1996-1997. DIF was tested using a multiple indicators multiple causes (MIMIC) method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health) produced estimates of the magnitude of DIF effects.<h4>Results</h4>The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size.<h4>Conclusions</h4>SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.