Meta-Analysis of Interrater Reliability of Supervisory Performance Ratings: Effects of Appraisal Purpose, Scale Type, and Range Restriction.
ABSTRACT: Objectives: This reliability generalization study aimed to estimate the mean and variance of the interrater reliability coefficients (r yy ) of supervisory ratings of overall, task, contextual, and positive job performance. The moderating effect of the appraisal purpose and the scale type was examined. It was hypothesized that the ratings collected for research purposes and multi-item scales have higher r yy . It was also examined whether r yy was similar for the four performance dimensions. Method: A database consisting of 224 independent samples was created and hierarchical sub-grouping meta-analyses were conducted. Results: The appraisal purpose was a moderator of r yy for the four performance dimensions. Scale type was a moderator of r yy for overall and task performance collected for research purposes. The findings also suggest that supervisors seem to have less difficulty evaluating overall job performance than task, contextual, and positive performance. The best estimates of the observed r yy for overall job performance are 0.61 for research-collected ratings and 0.45 for administrative-collected ratings. Conclusions: (1) Appraisal purpose moderates r yy and researchers and practitioners should be aware of its effects before collecting ratings or using empirically-derived interrater reliability distributions, (2) Scale type seems to moderate r yy in the case of the ratings collected for research purposes, only, (3) overall job performance is more reliably rated than task, contextual, and positive performance. Implications for research and practice are discussed.
Project description:Previous research has shown that job insecurity is linked to a range of performance outcomes, but the number of studies exploring this relationship is still limited and the results are somewhat mixed. The first aim of this study was to meta-analytically investigate how job insecurity is related to task performance, contextual performance, counterproductive work behavior, creativity, and safety compliance. The second aim was to test two method-related factors (cross-sectional vs. longitudinal associations and self- vs. supervisor-ratings of performance) and two macro-level indicators of social protection (social welfare regime and union density) as moderators of these associations. The results show that job insecurity was generally associated with impaired employee performance. These findings were generally similar both cross-sectionally and longitudinally and irrespective of rater. Overall, the associations between job insecurity and negative performance outcomes were weaker in welfare regimes characterized by strong social protection, whereas the results concerning union density produced mixed results. A majority of the findings confirmed the negative associations between job insecurity and types of employee performance, but future research is needed to elaborate on the effects of temporal aspects, differences between ratings sources, and further indicators of social protection in different cultural settings in the context of job insecurity.
Project description:This article offers a correlation matrix of meta-analytic estimates between various employee job attitudes (i.e., Employee engagement, job satisfaction, job involvement, and organizational commitment) and indicators of employee effectiveness (i.e., Focal performance, contextual performance, turnover intention, and absenteeism). The meta-analytic correlations in the matrix are based on over 1100 individual studies representing over 340,000 employees. Data was collected worldwide via employee self-report surveys. Structural path analyses based on the matrix, and the interpretation of the data, can be found in "Investigating the incremental validity of employee engagement in the prediction of employee effectiveness: a meta-analytic path analysis" (Mackay et al., 2016) .
Project description:BACKGROUND: Primary care staffing decisions are often made unsystematically, potentially leading to increased costs, dissatisfaction, turnover, and reduced quality of care. This article aims to (1) catalogue the domain of primary care tasks, (2) explore the complexity associated with these tasks, and (3) examine how tasks performed by different job titles differ in function and complexity, using Functional Job Analysis to develop a new tool for making evidence-based staffing decisions. METHODS: Seventy-seven primary care personnel from six US Department of Veterans Affairs (VA) Medical Centers, representing six job titles, participated in two-day focus groups to generate 243 unique task statements describing the content of VA primary care. Certified job analysts rated tasks on ten dimensions representing task complexity, skills, autonomy, and error consequence. Two hundred and twenty-four primary care personnel from the same clinics then completed a survey indicating whether they performed each task. Tasks were catalogued using an adaptation of an existing classification scheme; complexity differences were tested via analysis of variance. RESULTS: Objective one: Task statements were categorized into four functions: service delivery (65%), administrative duties (15%), logistic support (9%), and workforce management (11%). Objective two: Consistent with expectations, 80% of tasks received ratings at or below the mid-scale value on all ten scales. Objective three: Service delivery and workforce management tasks received higher ratings on eight of ten scales (multiple functional complexity dimensions, autonomy, human error consequence) than administrative and logistic support tasks. Similarly, tasks performed by more highly trained job titles received higher ratings on six of ten scales than tasks performed by lower trained job titles. Contrary to expectations, the distribution of tasks across functions did not significantly vary by job title. CONCLUSION: Primary care personnel are not being utilized to the extent of their training; most personnel perform many tasks that could reasonably be performed by personnel with less training. Primary care clinics should use evidence-based information to optimize job-person fit, adjusting clinic staff mix and allocation of work across staff to enhance efficiency and effectiveness.
Project description:Team coordination within clinical care settings is a critical component of effective patient care. Less is known about the extent, effectiveness, and impact of coordination activities among professionals within VA Patient-Aligned Care Teams (PACTs). This study will address these gaps by describing the specific, fundamental tasks and practices involved in PACT coordination, their impact on performance measures, and the role of coordination task complexity.First, we will use a web-based survey of coordination practices among 1600 PACTs in the national VHA. Survey findings will characterize PACT coordination practices and assess their association with clinical performance measures. Functional job analysis, using 6-8 subject matter experts who are 3rd and 4th year residents in VA Primary Care rotations, will be utilized to identify the tasks involved in completing clinical performance measures to standard. From this, expert ratings of coordination complexity will be used to determine the level of coordinative complexity required for each of the clinical performance measures drawn from the VA External Peer Review Program (EPRP). For objective 3, data collected from the first two methods will evaluate the effect of clinical complexity on the relationships between measures of PACT coordination and their ratings on the clinical performance measures.Results from this study will support successful implementation of coordinated team-based work in clinical settings by providing knowledge regarding which aspects of care require the most complex levels of coordination and how specific coordination practices impact clinical performance.
Project description:Mini-CEX scores assess resident competence. Rater training might improve mini-CEX score interrater reliability, but evidence is lacking.Evaluate a rater training workshop using interrater reliability and accuracy.Randomized trial (immediate versus delayed workshop) and single-group pre/post study (randomized groups combined).Academic medical center.Fifty-two internal medicine clinic preceptors (31 randomized and 21 additional workshop attendees).The workshop included rater error training, performance dimension training, behavioral observation training, and frame of reference training using lecture, video, and facilitated discussion. Delayed group received no intervention until after posttest.Mini-CEX ratings at baseline (just before workshop for workshop group), and four weeks later using videotaped resident-patient encounters; mini-CEX ratings of live resident-patient encounters one year preceding and one year following the workshop; rater confidence using mini-CEX.Among 31 randomized participants, interrater reliabilities in the delayed group (baseline intraclass correlation coefficient [ICC] 0.43, follow-up 0.53) and workshop group (baseline 0.40, follow-up 0.43) were not significantly different (p = 0.19). Mean ratings were similar at baseline (delayed 4.9 [95% confidence interval 4.6-5.2], workshop 4.8 [4.5-5.1]) and follow-up (delayed 5.4 [5.0-5.7], workshop 5.3 [5.0-5.6]; p = 0.88 for interaction). For the entire cohort, rater confidence (1 = not confident, 6 = very confident) improved from mean (SD) 3.8 (1.4) to 4.4 (1.0), p = 0.018. Interrater reliability for ratings of live encounters (entire cohort) was higher after the workshop (ICC 0.34) than before (ICC 0.18) but the standard error of measurement was similar for both periods.Rater training did not improve interrater reliability or accuracy of mini-CEX scores.clinicaltrials.gov identifier NCT00667940
Project description:Background:Although the concept of workload is important to nursing practice, only a few nursing researchers have focused on the issue of workload within the nursing context. Knowledge of how the dynamics of workload affects the job stress of nurses working in a specific unit or department in a hospital setting, and the influence of coworker support on this relationship, still remains limited. This study, therefore examined the effect of workload on job stress of Ghanaian outpatient department nurses and the moderating effect of coworker support on this relationship. Methods:A cross-sectional survey design was used, and questionnaire was used to collect data from a sample of 216 outpatient department nurses from four major hospitals in Ghana. The data collected measured workload, job stress, and coworker support using National Aeronautics and Space Administration (NASA) Task Load Index, job stress scale, and coworker support scale, respectively. Data were analysed using descriptive statistics, correlation, and hierarchical regression. Results:High levels of workload were associated with high levels of job stress of the nurses. Also, higher levels of workload were related to higher levels of job stress for nurses who received high levels of coworker support, but this was not the case for those who received low levels of coworker support (reserve buffering effect). Conclusion:The finding reiterates the adverse effect of workloads on employees' health, and the reverse buffering effect implies that supporting a colleague at work should be conveyed in a positive manner devoid of negative appraisal.
Project description:Although a large number of studies have pointed to the potential of emotional intelligence (EI) in the context of personnel selection, research in real-life selection contexts is still scarce. The aim of the present study was to examine whether EI would predict Assessment Center (AC) ratings of job-relevant competencies in a sample of applicants for the position of a flight attendant. Applicants' ability to regulate emotions predicted performance in group exercises. However, there were inconsistent effects of applicants' ability to understand emotions: Whereas the ability to understand emotions had a positive effect on performance in interview and role play, the effect on performance in group exercises was negative. We suppose that the effect depends on task type and conclude that tests of emotional abilities should be used judiciously in personnel selection procedures.
Project description:The objective of the study was to reveal through pragmatic MCDA (EVIDEM) the contribution of a broad range of criteria to the value of the orphan drug lenvatinib for radioiodine refractory differentiated thyroid cancer (RR-DTC) in country-specific contexts.The study was designed to enable comprehensive appraisal (12 quantitative, 7 qualitative criteria) in the current disease context (watchful waiting, sorafenib) of France, Italy and Spain. Data on the value of lenvatinib was collected from diverse stakeholders during country-specific panels and included: criteria weights (individual and social values); performance scores (judgments on evidence-collected through MCDA systematic review); qualitative impacts of contextual criteria; and verbal and written insights structured by criteria. The value contribution of each criterion was calculated and uncertainty explored.Comparative effectiveness, Quality of evidence (Spain and Italy) and Disease severity (France) received the greatest weights. Four criteria contributed most to the value of lenvatinib, reflecting its superior Comparative effectiveness (16-22% of value), the severity of RR-DTC (16-22%), significant unmet needs (14-21%) and robust evidence (14-20%). Contributions varied by comparator, country and individuals, highlighting the importance of context and consultation. Results were reproducible at the group level. Impacts of contextual criteria varied across countries reflecting different health systems and cultural backgrounds. The MCDA process promoted sharing stakeholders' knowledge on lenvatinib and insights on context.The value of lenvatinib was consistently positive across diverse therapeutic contexts. MCDA identified the aspects contributing most to value, revealed rich contextual insights, and helped participants express and explicitly tackle ethical trade-offs inherent to balanced appraisal and decisionmaking.
Project description:Anatomists and radiologists use the Zaidi-Dayal and Richards-Jabbour scales to study the shape of the foramen magnum. Our aim is to measure the interrater and intrarater agreement and reliability of ratings made using the two scales. We invited 16 radiology residents to attend two sessions, four weeks apart. During each session, we asked the residents to classify the shape of the foramen magnum in 35 images using both scales. We used Fleiss' ? to measure interrater reliability and Cohen's ? to measure intrarater reliability. The interrater reliability of ratings made using the Zaidi-Dayal scale was 0.34 (0.26-0.46) for session one and 0.30 (0.24-0.39) for session two, and the intrarater reliability was 0.39 (0.34-0.44). The interrater reliability of ratings made using the Richards-Jabbour scale was 0.14 (0.10-0.19) for session one and 0.12 (0.09-0.17) for session two, and the intrarater reliability was 0.11 (0.07-0.15). In conclusion, the interrater and intrarater agreement and reliability of ratings made using the Zaidi-Dayal and Richards-Jabbour scales are inadequate. We recommend an objective method by Zdilla et al. to researchers interested in studying the shape of the foramen magnum.
Project description:Clinical-performance measurement has helped improve the quality of health-care; yet success in attaining high levels of quality across multiple domains simultaneously still varies considerably. Although many sources of variability in care quality have been studied, the difficulty required to complete the clinical work itself has received little attention.We present a task-based methodology for evaluating the difficulty of clinical-performance measures (CPMs) by assessing the complexity of their component requisite tasks.Using Functional Job Analysis (FJA), subject-matter experts (SMEs) generated task lists for 17 CPMs; task lists were rated on ten dimensions of complexity, and then aggregated into difficulty composites.Eleven outpatient work SMEs; 133 VA Medical Centers nationwide.Clinical Performance: 17 outpatient CPMs (2000-2008) at 133 VA Medical Centers nationwide. Measure Difficulty: for each CPM, the number of component requisite tasks and the average rating across ten FJA complexity scales for the set of tasks comprising the measure.Measures varied considerably in the number of component tasks (M?=?10.56, SD?=?6.25, min?=?5, max?=?25). Measures of chronic care following acute myocardial infarction exhibited significantly higher measure difficulty ratings compared to diabetes or screening measures, but not to immunization measures ([Formula: see text]?=?0.45, -0.04, -0.05, and -0.06 respectively; F (3, 186)?=?3.57, p?=?0.015). Measure difficulty ratings were not significantly correlated with the number of component tasks (r?=?-0.30, p?=?0.23).Evaluating the difficulty of achieving recommended CPM performance levels requires more than simply counting the tasks involved; using FJA to assess the complexity of CPMs' component tasks presents an alternate means of assessing the difficulty of primary-care CPMs and accounting for performance variation among measures and performers. This in turn could be used in designing performance reward programs, or to match workflow to clinician time and effort.