Elaboration and validation of Crohn's disease anoperineal lesions consensual definitions.
ABSTRACT: To establish consensual definitions of anoperineal lesions of Crohn's (APLOC) disease and assess interobserver agreement on their diagnosis between experts.A database of digitally recorded pictures of APLOC was examined by a coordinating group who selected two series of 20 pictures illustrating the various aspects of APLOC. A reading group comprised: eight experts from the Société Nationale Française de Colo Proctologie group of study and research in proctology and one academic dermatologist. All members of the coordinating and reading groups participated in dedicated meetings. The coordinating group initially conducted a literature review to analyse verbatim descriptions used to evaluate APLOC. The study included two phases: establishment of consensual definitions using a formal consensus method and later assessment of interobserver agreement on the diagnosis of APLOC using photos of APLOC, a standardised questionnaire and Fleiss's kappa test or descriptive statistics.Terms used in literature to evaluate visible APLOC did not include precise definitions or reference to definitions. Most of the expert reports on the first set of photos agreed with the main diagnosis but their verbatim reporting contained substantial variation. The definitions of ulceration (entity, depth, extension), anal skin tags (entity, inflammatory activity, ulcerated aspect), fistula (complexity, quality of drainage, inflammatory activity of external openings), perianal skin lesions (abscess, papules, edema, erythema) and anoperineal scars were validated. For fistulae, they decided to follow the American Gastroenterology Association's guidelines definitions. The diagnosis of ulceration (κ = 0.70), fistulae (κ = 0.75), inflammatory activity of external fistula openings (86.6% agreement), abscesses (84.6% agreement) and erythema (100% agreement) achieved a substantial degree of interobserver reproducibility.This study constructed consensual definitions of APLOC and their characteristics and showed that experts have a fair level of interobserver agreement when using most of the definitions.
Project description:Breast cancers can be classified by hierarchical clustering using an "intrinsic" gene list into one of at least five molecular subtypes: basal-like, HER2, luminal A, luminal B, and normal breast-like. Five different intrinsic gene lists composed of varying numbers of genes have been used for molecular subtype identification and classification of breast cancers. The aim of this study was to determine the objectivity and interobserver reproducibility of the assignment of molecular subtype classes by hierarchical cluster analysis.Three publicly available breast cancer datasets (n = 779) were subjected to two-way average-linkage hierarchical cluster analysis using five distinct intrinsic gene lists. We used free-marginal Kappa statistics to analyze interobserver agreement among five breast cancer researchers for the whole classification and for each molecular subtype separately according to each intrinsic gene list for each breast cancer dataset.None of the classification systems tested produced almost perfect agreement (Kappa ≥ 0.81) among observers. However, substantial interobserver agreement (70.8% to 76.1% of the samples and free-marginal Kappa scores from 0.635 to 0.701) was consistently observed in all datasets for four molecular subtypes (luminal, basal-like, HER2, and normal breast-like). When luminal cancers were subdivided (luminal A, B, and C), none of the classification systems produced substantial agreement (Kappa ≥ 0.61) in all the datasets analyzed. Analysis of each subtype separately revealed that only two (basal-like and HER2) could be reproducibly identified by independent observers (Kappa ≥ 0.81).Assignment of molecular subtype classes of breast cancer based on the analysis of dendrograms obtained with hierarchical cluster analysis is subjective and shows modest interobserver reproducibility. For the development of a molecular taxonomy, objective definitions for each molecular subtype and standardized methods for their identification are required.
Project description:Ultrasonography is the best available tool for the initial work-up of thyroid nodules. Substantial interobserver variability has been documented in the recognition and reporting of some of the lesion characteristics. A number of classification systems have been developed to estimate the likelihood of malignancy: several of them have been endorsed by scientific societies, but their reproducibility is yet to be assessed. We evaluated the interobserver variability of the AACE/ACE/AME, ACR, ATA, EU-TIRADS and K-TIRADS classification systems and the interobserver concordance in the indication to FNA biopsy. Two raters independently evaluated 1055 ultrasound images of thyroid nodules identified in 265 patients at multiple time points, in two separate sets (501 and 554 images). After the first set of nodules, a joint reading was performed to reach a consensus in the feature definitions. The interobserver agreement (Krippendorff alpha) in the first set of nodules was 0.47, 0.49, 0.49, 0.61 and 0.53, for AACE/ACE/AME, ACR, ATA, EU-TIRADS and K-TIRADS systems, respectively. The agreement for the indication to biopsy was substantial to near-perfect, being 0.73, 0.61, 0.75, 0.68 and 0.82, respectively (Cohen's kappa). For all systems, agreement on the nodules of the second set increased. Despite the wide variability in the description of single ultrasonographic features, the classification systems may improve the interobserver agreement that further ameliorates after a specific training. When selecting nodules to be submitted to FNA biopsy, that is main purpose of these classifications, the interobserver agreement is substantial to almost perfect.
Project description:BACKGROUND:Several new definitions for categorizing the severely injured as the Berlin Definition have been developed. Here, severely injured patients are selected by additive physiological parameters and by the general Abbreviated Injury Scale (AIS)-based assessment. However, all definitions should conform to an AIS severity coding applied by an expert. We examined the dependence of individual coding on defining injury severity in general and in identifying polytrauma according to several definitions. A precise definition of polytrauma is important for quality management. METHODS:We investigated the interobserver reliability (IR) between several polytrauma definitions for identifying polytrauma using several cut-off levels (ISS ?16, ?18, ?20, ?25 points, and the Berlin Definition). One hundred and eighty-seven patients were included for analyzing IR of the polytrauma definitions. IR for polytrauma definitions was assessed by Cohen's kappa. RESULTS:IR for identifying polytrauma according to the relevant definitions showed moderate agreement (<0.60) in the ISS cutoff categories (ISS ?16, ?18, and ?20 points), while ISS ?25 points just reached substantial agreement (0.62) and the Berlin Definition demonstrated a correlation of 0.77 which is nearly perfect agreement (>0.80). CONCLUSION:Compared with the ISS-based definitions of polytrauma, the Berlin Definition proved less dependent on the individual rater. This underlines the need to redefine the selection of severely injured patients. Using the Berlin Definition for identifying polytrauma could improve the comparability of patient data across studies, in trauma center benchmarking, and in quality assurance.
Project description:BACKGROUND AND STUDY AIMS: Endoscopic mucosal resection (EMR) plays an important role in the staging of Barrett's esophagus (BE) and the evaluation of high grade dysplasia (HGD). The study aim is to assess the interobserver agreement among gastroenterologists expert in BE endotherapy, gastroenterologists without specified expertise in BE endotherapy, and gastroenterology trainees in recommending EMR vs ablation for BE HGD lesions, and to assess the effect of a one-time educational intervention on the interobserver agreement among non-experts and trainees. PATIENTS AND METHODS: An electronic survey containing 30 still endoscopic images of BE HGD was sent to three groups of respondents: experts, non-experts, and trainees. Respondents were asked to select "Endoscopic Mucosal Resection" or "Ablation" as the most appropriate next step in management. Non-experts and trainees were then invited to repeat the survey following an educational intervention. The main outcome measure was interobserver agreement measured by Fleiss' Kappa statistic and percent agreement. RESULTS: In selecting between EMR and ablation, on the pre-intervention survey there was the highest amount of agreement among experts (kappa = 0.437), followed by agreement among trainees (kappa = 0.281), and non-experts (kappa = 0.107). Experts demonstrated significantly higher agreement compared to either trainees (P < 0.001) or non-experts (P < 0.001). On the post-intervention survey, interobserver agreement remained low among both trainees (kappa = 0.20) and non-experts (kappa = 0.14). Comparing the results of the surveys, there was no evidence that agreement differed for either trainees or non-experts. CONCLUSIONS: Future efforts are needed to enable endoscopist recognition of BE HGD lesions. Consensus guidelines alone are insufficient in directing preferred endoscopic management of BE HGD.
Project description:OBJECTIVES:To prospectively investigate concordance between whole-body MRI (WB-MRI) and a composite reference standard for initial staging and interim response evaluation in paediatric and adolescent Hodgkin's lymphoma. METHODS:Fifty patients (32 male, age range 6-19 years) underwent WB-MRI and standard investigations, including 18F-FDG-PET-CT at diagnosis and following 2-3 chemotherapy cycles. Two radiologists in consensus interpreted WB-MRI using prespecified definitions of disease positivity. A third radiologist reviewed a subset of staging WB-MRIs (n = 38) separately to test for interobserver agreement. A multidisciplinary team derived a primary reference standard using all available imaging/clinical investigations. Subsequently, a second multidisciplinary panel rereviewed all imaging with long-term follow-up data to derive an enhanced reference standard. Interobserver agreement for WB-MRI reads was tested using kappa statistics. Concordance for correct classification of all disease sites, true positive rate (TPR), false positive rate (FPR) and kappa for staging/response agreement were calculated for WB-MRI. RESULTS:There was discordance for full stage in 74% (95% CI 61.9-83.9%) and 44% (32.0-56.6%) of patients against the primary and enhanced reference standards, respectively. Against the enhanced reference standard, the WB-MRI TPR, FPR and kappa were 91%, 1% and 0.93 (0.90-0.96) for nodal disease and 79%, < 1% and 0.86 (0.77-0.95) for extra-nodal disease. WB-MRI response classification was correct in 25/38 evaluable patients (66%), underestimating response in 26% (kappa 0.30, 95% CI 0.04-0.57). There was a good agreement for nodal (kappa 0.78, 95% CI 0.73-0.84) and extra-nodal staging (kappa 0.60, 95% CI 0.41-0.78) between WB-MRI reads CONCLUSIONS: WB-MRI has reasonable accuracy for nodal and extra-nodal staging but is discordant with standard imaging in a substantial minority of patients, and tends to underestimate disease response. KEY POINTS:• This prospective single-centre study showed discordance for full patient staging of 44% between WB-MRI and a multi-modality reference standard in paediatric and adolescent Hodgkin's lymphoma. • WB-MRI underestimates interim disease response in paediatric and adolescent Hodgkin's lymphoma. • WB-MRI shows promise in paediatric and adolescent Hodgkin's lymphoma but currently cannot replace conventional staging pathways including 18F-FDG-PET-CT.
Project description:Magnetic resonance imaging (MRI) grading systems using sagittal images are useful for evaluation of lumbar foraminal stenosis. We evaluated whether such a grading system is useful as a diagnostic tool for surgery.Between July 2014 and June 2015, 99 consecutive patients underwent unilateral lumbar foraminotomy for lumbar foraminal stenosis. Surgically confirmed foraminal stenosis and the contralateral, asymptomatic neuroforamen were assessed based on a 4-point MRI grading system. Two experienced researchers independently evaluated the MR sagittal images. Interobserver agreement and intraobserver agreement were analyzed using κ statistics.The mean age of patients (54 women, 45 men) was 62.5 years. A total of 101 levels (202 neuroforamens) were evaluated. MRI grades for operated neuroforamens were as follows: Grade 0 in 0.99%, Grade 1 in 5.28%, Grade 2 in 14.85%, and Grade 3 in 78.88%. Interobserver agreement was moderate for operated neuroforamens (κ=0.511) and good for asymptomatic neuroforamens (κ=0.696). Intraobserver agreement by reader 1 for operated neuroforamens was good (κ=0.776) and that for asymptomatic neuroforamens was very good (κ=0.831). In terms of lumbar level, interobserver agreement for L5-S1 (κ=0.313, fair) was relatively lower than the other level (κ=0.804, very good).MRI grading system for lumbar foraminal stenosis is thought to be useful as a diagnostic tool for surgery in the lumbar spine; however, it is less reliable for symptomatic L5-S1 foraminal stenosis than for other levels. Thus, various clinical factors as well as the MRI grading system are required for surgical decision-making.
Project description:To validate a scale for grading vitreous haze in uveitis using digitized photographs and standardized scoring.Evaluation of clinical research methodology.Calibrated Bangerter diffusion filters inducing incremental decrements of spatial contrast were placed in front of the camera lens while photographing a normal eye to simulate vitreous haze. The photographs were digitized and an ordinal scale was created from 0 (none) to 8 (highest level of opacification at which fundus details could be seen). The scale steps correspond approximately to decimal Snellen visual acuities of 1.0, 0.8, 0.4, 0.2, 0.1, 0.04, 0.02, 0.01, and 0.002, with approximately 0.3 log step between each step. For validation, digitized fundus photographs of uveitis patients were displayed on a computer monitor for comparison with the standard photos. Three observers graded the test set twice under standard conditions. Interobserver and intraobserver variability and κ values for agreement greater than chance were calculated.Variance component analysis determined that 87.7% of the variance in grades was attributable to the test item rather than to grader or session. The intraclass correlation between graders and grading sessions varied from 0.84 to 0.91. Simple agreement within 1 grade between graders and sessions occurred in 90 ± 5.5% of gradings. κ values averaged 0.91, which is considered near perfect.A 9-step photographic scale was designed to standardize the grading of vitreous haze in uveitis patients using fundus photographs. The scale is potentially adaptable to clinical trials in uveitis.
Project description:Objective To design a simple magnetic resonance (MR)-based assessment system for quantification of osteochondral defect severity prior to cartilage repair surgery at the knee. Design The new scoring tool was supposed to include 3 different parameters: (1) cartilage defect size, (2) depth/morphology of the cartilage defect, and (3) subchondral bone quality, resulting in a specific 3-digit code. A clearly defined numeric score was developed, resulting in a final score of 0 to 100. Defect severity grades I through IV were defined. For intra- and interobserver agreement, defects were assessed by 2 independent readers on preoperative knee MR images of n = 44 subjects who subsequently received cartilage repair surgery. For statistical analyses, mean values ± standard deviation (SD), interclass correlation coefficients (ICC), and linear weighted kappa values were calculated. Results The mean total Area Measurement And DEpth & Underlying Structures (AMADEUS) score was 48 ± 24, (range, 0-85). The mean defect size was 2.8 ± 2.6 cm2. There were 36 of 44 full-thickness defects. The subchondral bone showed defects in 21 of 44 cases. Kappa values for intraobserver reliability ranged between 0.82 and 0.94. Kappa values for interobserver reliability ranged between 0.38 and 0.85. Kappa values for AMADEUS grade were 0.75 and 0.67 for intra- and interobserver agreement, respectively. ICC scores for the AMADEUS total score were 0.97 and 0.96 for intra- and interobserver agreement, respectively. Conclusions The AMADEUS score and classification system allows reliable severity encoding, scoring and grading of osteochondral defects on knee MR images, which is easily clinically applicable in daily practice.
Project description:BACKGROUND:Histologic features of idiopathic non-cirrhotic portal hypertension (INCPH) may overlap with those without INCPH. Recently, these features have been recognized as part of the larger spectrum of porto-sinusoidal vascular disease (PSVD). We assessed interobserver agreement on histologic features that are commonly associated with INCPH and studied whether a provision of relevant clinical history improves interobserver agreement. METHODS:The examined histologic features include lobular (such as anisocytosis, nodular regeneration, sinusoidal dilatation, increased parenchymal draining veins, and incomplete fibrous septa) and portal tract changes (such as paraportal shunting vessel(s), portal tract remnant, increased number of portal vessels, and obliterative portal venopathy). Thirty-four archived liver samples from patients with (group A) and without (group B) INCPH were retrieved. A total of 90 representative images of lobules (L) and portal tracts (P) were distributed among 9 liver pathologists blinded to true clinical history. Each pathologist answered multiple choice questions based on the absence (Q1) or presence (Q2) of clinical history of portal hypertension. Fleiss' kappa coefficient analysis (unweighted) was performed to assess interobserver agreement on normal versus abnormal diagnosis, in L and P, based on Q1 and Q2. RESULTS:The kappa values regarding normal versus abnormal diagnosis were 0.24, 0.24, 0.18 and 0.18 for L-Q1, L-Q2, P-Q1, and P-Q2, respectively. With true clinical history provided, the kappa values were L- 0.32, P-0.17 for group A and L-0.12, P-0.14 for group B. Four pathologists changed their assessments based on the provided history. Interobserver agreement on the interpretation of L and P as normal versus abnormal was slight to fair regardless of provision of clinical history. CONCLUSIONS:Our findings indicate that the histologic features of INCPH/PSVD are not limited to patients with portal hypertension and are subject to significant interobserver variation.
Project description:OBJECTIVE: To propose a new and practical MRI grading method for cervical neural foraminal stenosis and to evaluate its reproducibility. METHODS: We evaluated 50 patients (37 males and 13 females, mean age 49 years) who visited our institution and underwent oblique sagittal MRI of the cervical spine. A total of 300 foramina and corresponding nerve roots in 50 patients were qualitatively analysed from C4-5 to C6-7. We assessed the grade of cervical foraminal stenosis at the maximal narrowing point according to the new grading system based on T2 weighted oblique sagittal images. The incidence of each of the neural foraminal stenosis grades according to the cervical level was analysed by χ(2) tests. Intra- and interobserver agreements between two radiologists were analysed using kappa statistics. Kappa value interpretations were poor (κ<0.1), slight (0.1≤κ≤0.2), fair (0.2<κ≤0.4), moderate (0.4<κ≤0.6), substantial (0.6<κ≤0.8) and almost perfect (0.8<κ≤1.0). RESULTS: Significant stenoses (Grades 2 and 3) were rarely found at the C4-5 level. The incidence of Grade 3 at the C5-6 level was higher than that at other levels, a difference that was statistically significant. The overall intra-observer agreement according to the cervical level was almost perfect. The agreement at each level was almost perfect, except for only substantial agreement at the right C6-7 by Reader 2. No statistically significant differences were seen according to the cervical level. Overall kappa values of interobserver agreement according to the cervical level were almost perfect. In addition, the agreement of each level was almost perfect. Overall intra- and interobserver agreement for the presence of foraminal stenosis (Grade 0 vs Grades 1, 2 and 3) and for significant stenosis (Grades 0 and 1 vs Grades 2 and 3) showed similar results and were almost perfect. However, only substantial agreement was seen in the right C6-7. CONCLUSION: A new grading system for cervical foraminal stenosis based on oblique sagittal MRI provides reliable assessment and good reproducibility. This new grading system is a useful and easy method for the objective evaluation of cervical neural foraminal stenosis by radiologists and clinicians. ADVANCES IN KNOWLEDGE: The use of the new grading system for cervical foraminal stenosis based on oblique sagittal MRI can be a useful method for evaluating cervical neural foraminal stenosis.