Project description:BackgroundThe European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of each scale of the EORTC Quality of Life Questionnaire (EORTC QLQ-C30). This study aims to develop an item bank for the EORTC QLQ-C30 cognitive functioning scale, which can be used for CAT.MethodsThe complete developmental approach comprised four phases: (I) conceptualization and literature search, (II) operationalization, (III) pretesting, and (IV) field-testing. This paper describes phases I-III.I) A literature search was performed to identify self-report instruments and items measuring cognitive complaints on concentration and memory. II) A multistep item-selection procedure was applied to select and generate items that were relevant and compatible with the 'QLQ-C30 item style.' III) Cancer patients from different countries evaluated the item list for wording (ie, whether items were difficult, confusing, annoying, upsetting or intrusive), and whether relevant issues were missing.ResultsA list of 439 items was generated by the literature search. In the multistep item-selection procedure, these items were evaluated for relevance, redundancy, clarity, and response format, resulting in an list of 45 items. A total of 32 patients evaluated this item list in the pretesting phase, resulting in a preliminary list of 44 items.ConclusionPhase I-III resulted in an item list of 44 items measuring self-reported cognitive complaints that was endorsed by international experts and cancer patients in several countries. This list will be evaluated for its psychometric characteristics in phase IV.
Project description:Cognitive diagnosis models (CDMs) are latent class models that hold great promise for providing diagnostic information about student knowledge profiles. The increasing use of computers in classrooms enhances the advantages of CDMs for more efficient diagnostic testing by using adaptive algorithms, referred to as cognitive diagnosis computerized adaptive testing (CD-CAT). When multiple-choice items are involved, CD-CAT can be further improved by using polytomous scoring (i.e., considering the specific options students choose), instead of dichotomous scoring (i.e., marking answers as either right or wrong). In this study, the authors propose and evaluate the performance of the Jensen-Shannon divergence (JSD) index as an item selection method for the multiple-choice deterministic inputs, noisy "and" gate (MC-DINA) model. Attribute classification accuracy and item usage are evaluated under different conditions of item quality and test termination rule. The proposed approach is compared with the random selection method and an approximate approach based on dichotomized responses. The results show that under the MC-DINA model, JSD improves the attribute classification accuracy significantly by considering the information from distractors, even with a very short test length. This result has important implications in practical classroom settings as it can allow for dramatically reduced testing times, thus resulting in more targeted learning opportunities.
Project description:Cognitive diagnostic computerized adaptive testing (CD-CAT) aims to obtain more useful diagnostic information by taking advantages of computerized adaptive testing (CAT). Cognitive diagnosis models (CDMs) have been developed to classify examinees into the correct proficiency classes so as to get more efficient remediation, whereas CAT tailors optimal items to the examinee's mastery profile. The item selection method is the key factor of the CD-CAT procedure. In recent years, a large number of parametric/nonparametric item selection methods have been proposed. In this article, the authors proposed a series of stratified item selection methods in CD-CAT, which are combined with posterior-weighted Kullback-Leibler (PWKL), nonparametric item selection (NPS), and weighted nonparametric item selection (WNPS) methods, and named S-PWKL, S-NPS, and S-WNPS, respectively. Two different types of stratification indices were used: original versus novel. The performances of the proposed item selection methods were evaluated via simulation studies and compared with the PWKL, NPS, and WNPS methods without stratification. Manipulated conditions included calibration sample size, item quality, number of attributes, number of strata, and data generation models. Results indicated that the S-WNPS and S-NPS methods performed similarly, and both outperformed the S-PWKL method. And item selection methods with novel stratification indices performed slightly better than the ones with original stratification indices, and those without stratification performed the worst.
Project description:Computerized adaptive testing (CAT) greatly improves measurement efficiency in high-stakes testing operations through the selection and administration of test items with the difficulty level that is most relevant to each individual test taker. This paper explains the 3 components of a conventional CAT item selection algorithm: test content balancing, the item selection criterion, and item exposure control. Several noteworthy methodologies underlie each component. The test script method and constrained CAT method are used for test content balancing. Item selection criteria include the maximized Fisher information criterion, the b-matching method, the a-stratification method, the weighted likelihood information criterion, the efficiency balanced information criterion, and the Kullback-Leibler information criterion. The randomesque method, the Sympson-Hetter method, the unconditional and conditional multinomial methods, and the fade-away method are used for item exposure control. Several holistic approaches to CAT use automated test assembly methods, such as the shadow test approach and the weighted deviation model. Item usage and exposure count vary depending on the item selection criterion and exposure control method. Finally, other important factors to consider when determining an appropriate CAT design are the computer resources requirement, the size of item pools, and the test length. The logic of CAT is now being adopted in the field of adaptive learning, which integrates the learning aspect and the (formative) assessment aspect of education into a continuous, individualized learning experience. Therefore, the algorithms and technologies described in this review may be able to help medical health educators and high-stakes test developers to adopt CAT more actively and efficiently.
Project description:Currently, there are two predominant approaches in adaptive testing. One, referred to as cognitive diagnosis computerized adaptive testing (CD-CAT), is based on cognitive diagnosis models, and the other, the traditional CAT, is based on item response theory. The present study evaluates the performance of two item selection rules (ISRs) originally developed in the CD-CAT framework, the double Kullback-Leibler information (DKL) and the generalized deterministic inputs, noisy "and" gate model discrimination index (GDI), in the context of traditional CAT. The accuracy and test security associated with these two ISRs are compared to those of the point Fisher information and weighted KL using a simulation study. The impact of the trait level estimation method is also investigated. The results show that the new ISRs, particularly DKL, could be used to improve the accuracy of CAT. Better accuracy for DKL is achieved at the expense of higher item overlap rate. Differences among the item selection rules become smaller as the test gets longer. The two CD-CAT ISRs select different types of items: items with the highest possible a parameter with DKL, and items with the lowest possible c parameter with GDI. Regarding the trait level estimator, expected a posteriori method is generally better in the first stages of the CAT, and converges with the maximum likelihood method when a medium to large number of items are involved. The use of DKL can be recommended in low-stakes settings where test security is less of a concern.
Project description:Objective:To detect the individual's severity of alcohol use disorder (AUD) in an effective and accurate manner, this study aimed to build an item bank for AUD screening and derive the computerized adaptive testing (CAT) version of AUD (CAT-AUD). Methods:The initial CAT-AUD item bank was selected from the Chinese version of the questionnaires related to AUD according to the DSM-5 criteria. Then 915 valid Chinese samples, covering the healthy individuals and the AUD high-risk individuals, completed the initial CAT-AUD item bank. By testing the unidimensionality, test fit, item fit, discrimination parameter and differential item functioning of the initial item bank, the final CAT-AUD item bank confirming to the requirements of the item response theory (IRT) were obtained. Subsequently, the CAT-AUD simulation study based on the real data of the final item bank conducted to detect characteristics, reliability, validity, and predictive utility (sensitivity and specificity) of CAT-AUD. Results:The CAT-AUD item bank meeting the IRT psychometric measurement requirements could be well geared into the graded response model. The Pearson's correlation between the estimated theta via CAT-AUD and the estimated theta via the full-length item bank reached 0.95, and the criterion-related validity was 0.63. CAT-AUD can provide highly reliable test results for subjects whose theta above -0.8 with an average of 16 items. Besides, the predictive utility of CAT-AUD was better than AUDIT and AUDIT-C. Conclusion:In brief, the CAT-AUD developed in this study can effectively screen the AUD high-risk group and accurately measure the AUD severity of individuals.
Project description:BackgroundThe European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF).MethodsIn previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties.ResultsA total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power.ConclusionA CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.
Project description:Evaluating items for potential differential item functioning (DIF) is an essential step to ensuring measurement fairness. In this article, we focus on a specific scenario, namely, the continuous response, severely sparse, computerized adaptive testing (CAT). Continuous responses items are growingly used in performance-based tasks because they tend to generate more information than traditional dichotomous items. Severe sparsity arises when many items are automatically generated via machine learning algorithms. We propose two uniform DIF detection methods in this scenario. The first is a modified version of the CAT-SIBTEST, a non-parametric method that does not depend on any specific item response theory model assumptions. The second is a regularization method, a parametric, model-based approach. Simulation studies show that both methods are effective in correctly identifying items with uniform DIF. A real data analysis is provided in the end to illustrate the utility and potential caveats of the two methods.
Project description:Current use of multidimensional computerized adaptive testing (MCAT) has been developed in conjunction with compensatory multidimensional item response theory (MIRT) models rather than with non-compensatory ones. In recognition of the usefulness of MCAT and the complications associated with non-compensatory data, this study aimed to develop MCAT algorithms using non-compensatory MIRT models and to evaluate their performance. For the purpose of the study, three item selection methods were adapted and compared, namely, the Fisher information method, the mutual information method, and the Kullback-Leibler information method. The results of a series of simulations showed that the Fisher information and mutual information methods performed similarly, and both outperformed the Kullback-Leibler information method. In addition, it was found that the more stringent the termination criterion and the higher the correlation between the latent traits, the higher the resulting measurement precision and test reliability. Test reliability was very similar across the dimensions, regardless of the correlation between the latent traits and termination criterion. On average, the difficulties of the administered items were found to be at a lower level than the examinees' abilities, which shed light on item bank construction for non-compensatory items.
Project description:Most computerized adaptive testing (CAT) programs do not allow item review due to a decrease in estimation precision and aberrant manipulation strategies. In this article, a block item pocket (BIP) method that combines the item pocket method with the successive block method to realize reviewable CAT was proposed. A worst-case but still reasonable answering strategy and the Wainer-like manipulation strategy were simulated to evaluate the estimation precision of reviewable unidimensional computerized adaptive testing (UCAT) and multidimensional computerized adaptive testing (MCAT) under a series of BIP settings. For both UCAT and MCAT, it was found that the estimation precision of the BIP method improved as the number of blocks increased or the item pocket size decreased under the reasonable strategy. The BIP method was more effective in handling the Wainer-like strategy. With the help of block design, the BIP method can still maintain acceptable estimation precision under slightly large total IP size conditions. These results suggested that the BIP method was a reliable solution for both reviewable UCAT and MCAT.