Remote monitoring of clubfoot treatment with digital photographs in low resource settings: Is it accurate?
ABSTRACT: BACKGROUND:Clinical examination and functional assessment are often the first steps to assess outcome of clubfoot treatment. Clinical photographs may be an adjunct used to assess treatment outcomes in lower resourced settings where physical review by a specialist is limited. We aimed to evaluate the diagnostic performance of photographic images of patients with clubfoot in assessing outcome following treatment. METHODS:In this single-centre diagnostic accuracy study, we included all children with clubfoot from a cohort treated between 2011 and 2013, in 2017. Two physiotherapists trained in clubfoot management calculated the Assessing Clubfoot Treatment (ACT) score for each child to decide if treatment was successful or if further treatment was required. Photographic images were then taken of 79 feet. Two blinded orthopaedic surgeons assessed three sets of images of each foot (n = 237 in total) at two time points (two months apart). Treatment for each foot was rated as 'success', 'borderline' or 'failure'. Intra- and inter-observer variation for the photographic image was assessed. Sensitivity, specificity, positive and negative predictive values were calculated for the photographic image compared to the ACT score. RESULTS:There was perfect correlation between clinical assessment and photographic evaluation of both raters at both time-points in 38 (48%) feet. The raters demonstrated acceptable reliability with re-scoring photographs (rater 1, k = 0.55; rater 2, k = 0.88). Thirty percent (n = 71) of photographs were assessed as poor quality image or sub-optimal patient position. Sensitivity of outcome with photograph compared to ACT score was 83.3%-88.3% and specificity ranged from 57.9%-73.3%. CONCLUSION:Digital photography may help to confirm, but not exclude, success of clubfoot treatment. Future work to establish photographic parameters as an adjunct to assessing treatment outcomes, and guidance on a standardised protocol for photographs, may be beneficial in the follow up of children who have treated clubfoot in isolated communities or lower resourced settings.
Project description:As lip augmentation becomes more popular, validated measures of lip fullness for quantification of outcomes are needed.Develop a scale for rating lip fullness and establish its reliability and sensitivity for assessing clinically meaningful differences.The initial Allergan Lip Fullness Scale (iLFS; a four-point photographic scale with verbal descriptions) was validated by eight physicians rating 55 live subjects during two rounds, conducted on one day. In addition, subjects performed self-evaluations. The revised Allergan Lip Fullness Scale (LFS), a five-point scale with a broader range of lip presentations, was validated by 21 clinicians in two online image rating sessions, ?14 days apart, in which they used the LFS to rate overall, upper, and lower lip fullness of 144 3-dimensional (3D) images. Physician inter- and intra-rater agreement, subject intra-rater agreement (iLFS), and subject-physician agreement (iLFS) were evaluated. Additionally, during online rating session 1, raters ranked 38 pairs of 3D images, taken before and after lip augmentation, as "clinically different" or "not clinically different." The median LFS score difference for clinically different pairs was calculated to determine the clinically meaningful difference.Clinician inter- and intra-rater agreement for the iLFS and LFS was substantial to almost perfect. Subject self-assessments (iLFS) had substantial intra-rater reliability and a high level of agreement with physician assessments. Median LFS score differences for overall, upper, and lower lip fullness were 1 (mean: 0.63-0.69) for "clinically different" and 0 (mean: 0.28-0.36) for "not clinically different" image pairs; thus, clinical significance of a 1-point difference in LFS score was established.The LFS is a reliable instrument for physician classification of lip fullness. A 1-point score difference can detect clinically meaningful differences in lip fullness.
Project description:Trachoma surveillance is most commonly performed by direct observation, usually by non-ophthalmologists using the World Health Organization (WHO) simplified grading system. However, conjunctival photographs may offer several benefits over direct clinical observation, including the potential for greater inter-rater agreement. This study assesses whether inter-rater agreement of trachoma grading differs when trained graders review conjunctival photographs compared to when they perform conjunctival examinations in the field.Three trained trachoma graders each performed an independent examination of the everted right tarsal conjunctiva of 269 children aged 0-9 years, and then reviewed photographs of these same conjunctivae in a random order. For each eye, the grader documented the presence or absence of follicular trachoma (TF) and intense trachomatous inflammation (TI) according to the WHO simplified grading system.Inter-rater agreement for the grade of TF was significantly higher in the field (kappa coefficient, ?, 0.73, 95% confidence interval, CI 0.67-0.80) than by photographic review (??=?0.55, 95% CI 0.49-0.63; difference in ? between field grading and photo grading 0.18, 95% CI 0.09-0.26). When field and photographic grades were each assessed as the consensus grade from the three graders, agreement between in-field and photographic graders was high for TF (??=?0.75, 95% CI 0.68-0.84).In an area with hyperendemic trachoma, inter-rater agreement was lower for photographic assessment of trachoma than for in-field assessment. However, the trachoma grade reached by a consensus of photographic graders agreed well with the grade given by a consensus of in-field graders.
Project description:It is generally recognized that Caucasians and Asians have different skin aging features. The aim of this study was to develop a facial wrinkle grading scale for Chinese women. Standard photographs were taken of 242 Chinese women. Six sets of 0 to 9 wrinkle scales with reference photographs and descriptions were selected, including grading scales for resting and hyperkinetic crow's feet, frontalis lines, glabellar frown lines, and nasolabial folds. To identify the scale by objective quantitative measurement, skin surface measurements from the Visioscan® VC98 were used. To test the reliability and validity of our wrinkle scale, a multi-rater consensus method was used. A double-blind, randomized, vehicle-controlled 12-week study was conducted to use this clinical photo-score to evaluate the efficacy and safety of Centella triterpenes cream® in treating crow's feet. A newly developed 10-point photographic and descriptive scale emerged from this study. The final atlas of these photographs contained a total of 6 sets with 10 pictures each. From 0 to 9, surface evaluation of smoothness (SEsm) parametric measurements decreased progressively, indicating that the scale increased inversely. Weighted kappa coefficients for intra-assessor were between 0.75-0.87. The overall Kendall's coefficient is 0.86 on the first rating and 0.87 on the second rating. Thirty-six volunteers were recruited and 35 subjects completed a 12-week trial. Clinical photo-score by investigator showed a significant difference (P<0.05) between the treatment side and control side after 4 weeks. Use of these scales in clinical settings to evaluate facial wrinkles in Asians individuals is recommended.
Project description:Static photographs are currently the most often employed stimuli in research on social perception. The method of photograph acquisition might affect the depicted subject's facial appearance and thus also the impression of such stimuli. An important factor influencing the resulting photograph is focal length, as different focal lengths produce various levels of image distortion. Here we tested whether different focal lengths (50, 85, 105 mm) affect depicted shape and perception of female and male faces. We collected three portrait photographs of 45 (22 females, 23 males) participants under standardized conditions and camera setting varying only in the focal length. Subsequently, the three photographs from each individual were shown on screen in a randomized order using a 3-alternative forced-choice paradigm. The images were judged for attractiveness, dominance, and femininity/masculinity by 369 raters (193 females, 176 males). Facial width-to-height ratio (fWHR) was measured from each photograph and overall facial shape was analysed employing geometric morphometric methods (GMM). Our results showed that photographs taken with 50 mm focal length were rated as significantly less feminine/masculine, attractive, and dominant compared to the images taken with longer focal lengths. Further, shorter focal lengths produced faces with smaller fWHR. Subsequent GMM revealed focal length significantly affected overall facial shape of the photographed subjects. Thus methodology of photograph acquisition, focal length in this case, can significantly affect results of studies using photographic stimuli perhaps due to different levels of perspective distortion that influence shapes and proportions of morphological traits.
Project description:In many clinical trials on cutaneous healing, wound closure is the primary endpoint and single most important outcome parameter, making precise assessment of this time point one of utmost importance. The assessment of wound closure can be performed either by subjective clinical inspection or with a variety of methodologies anticipated to provide more objective data. The aim of this study was to examine intra- and interrater variability of blinded photographic analysis of wound closure of human partial thickness wounds, as well as the reliability of remote photographic analysis of wounds with that of direct clinical assessment.Two plastic surgeons, a dermatologist, and a maxillofacial surgeon constituted our rater panel. High-resolution images of patient wounds derived from two randomized controlled clinical trials (EU Clinical Trials Register numbers EudraCT 2009-017418-56 (registered 12 January 2010) and EudraCT 2010-019945-24 (registered 13 July 2010)) were individually assessed by the blinded, experienced study raters. The reliability of photographic image analysis was tested using intraclass and interclass correlation. The validity of photographic image analysis was correlated with clinical assessments of documented time to heal from the study centers' files.The results demonstrated that the mean intraclass correlation coefficient of all four examiners was excellent (r = 0.79; 95% confidence interval (CI), 0.61, 1.00)). The interrater correlation coefficient was good (r = 0.67; 95% CI, 0.57, 1.00)) and therefore acceptable. The agreement between remote visual assessment and clinical assessment at the time of healing was good (r = 0.64; 95% CI, 0.52, 0.76)) with an overall difference of about 1 day.Remote photographic analysis of cutaneous wounds is a feasible instrument in clinical open-label studies to evaluate time to wound closure. We found that it was a reliable method of measuring wound closure that correlated satisfactorily with clinical judgment, bolstering the potential relevance in the current era of evolving application and dependency in the field of telemedicine.EU Clinical Trials Register EudraCT numbers 2009-017418-56 (date of registration: 12 January 2010) and 2010-019945-24 (date of registration: 13 July 2010).
Project description:BACKGROUND: The Clubfoot Assessment Protocol (CAP) was developed for follow-up of children treated for clubfoot. The objective of this study was to analyze reliability and validity of the six items used in the domain CAPMotion Quality using inexperienced assessors. FINDINGS: Four raters (two paediatric orthopaedic surgeons, two senior physiotherapists) used the CAP scores to analyze, on two different occasions, 11 videotapes containing standardized recordings of motion activity according to the domain CAPMotion Quality These results were compared to a criterion (two raters, well experienced CAP assessors) for validity and for checking for learning effect.Weighted kappa statistics, exact percentage observer agreement (Po), percentage observer agreement including one level difference (Po-1) and amount of scoring scales defined how reliability was to be interpreted. Inter- and intra rater differences were calculated using median and inter quartile ranges (IQR) on item level and mean and limits of agreement on domain level.Inter-rater reliability varied between fair and moderate (kappa) and had a mean agreement of 48/88% (Po/Po-1). Intra -rater reliability varied between moderate to good with a mean agreement of 63/96%. The intra- and inter-rater differences in the present study were generally small both on item (0.00) and domain level (-1.10). There was exact agreement of 51% and Po-1 of 91% of the six items with the criterion. No learning effect was found. CONCLUSION: The CAPMotion quality can be used by inexperienced assessors with sufficient reliability in daily clinical practice and showed acceptable accuracy compared to the criterion.
Project description:Criminal investigations often use photographic evidence to identify suspects. Here we combined robust face perception and high-resolution photography to mine face photographs for hidden information. By zooming in on high-resolution face photographs, we were able to recover images of unseen bystanders from reflections in the subjects' eyes. To establish whether these bystanders could be identified from the reflection images, we presented them as stimuli in a face matching task (Experiment 1). Accuracy in the face matching task was well above chance (50%), despite the unpromising source of the stimuli. Participants who were unfamiliar with the bystanders' faces (n = 16) performed at 71% accuracy [t(15) = 7.64, p<.0001, d = 1.91], and participants who were familiar with the faces (n = 16) performed at 84% accuracy [t(15) = 11.15, p<.0001, d = 2.79]. In a test of spontaneous recognition (Experiment 2), observers could reliably name a familiar face from an eye reflection image. For crimes in which the victims are photographed (e.g., hostage taking, child sex abuse), reflections in the eyes of the photographic subject could help to identify perpetrators.
Project description:Rodent grimace scales facilitate assessment of ongoing pain. Reported rater training using these scales varies considerably and may contribute to the observed variability in interrater reliability. This study evaluated the effect of training on interrater reliability with the Rat Grimace Scale (RGS). Two training sets (42 and 150 images) were prepared from acute pain models. Four trainee raters progressed through 2 rounds of training, scoring 42 images (set 1) followed by 150 images (set 2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then rescored (set 2b). Four years later, trainees rescored the 150 images (set 2c). A second group of raters (no-training group) scored the same image sets without review with the experienced rater. Inter- and intrarater reliability were evaluated by using the intraclass correlation coefficient (ICC), and ICC values were compared by using the Feldt test. In the trainee group, interrater reliability increased from moderate to very good between sets 1 and 2b and increased between sets 2a and 2b. Action units with the highest and lowest ICC at set 2b were orbital tightening and whiskers, respectively. In comparison to an experienced rater, the ICC for all trainees improved, ranging from 0.88 to 0.91 at set 2b. Four years later, very good interrater reliability was retained, and intrarater reliability was good or very good). The interrater reliability of the no-training group was moderate and did not improve from set 1 to set 2b. Training improved interrater reliability, with an associated reduction in 95%CI. In addition, training improved interrater reliability with an experienced rater, and performance was retained.
Project description:OBJECTIVE:Surgical site infection (SSI) is the most common nosocomial infection, in vascular surgery patients, who experience a high rate of readmission. Facilitating transition from hospital to outpatient care with digital image-based wound monitoring has the potential to detect and to enable treatment of SSI at an early stage. In this study, we evaluated whether smartphone digital images can supplant in-person evaluation of postoperative vascular surgery wounds. METHODS:We developed a wound assessment checklist using previously validated criteria. We recruited adults who underwent a vascular surgical procedure between 2014 and 2015, involving an incision of at least 3 cm, from a high-volume academic vascular surgery service. Vascular surgery care providers evaluated wounds in person using the assessment checklist; a different group of providers evaluated wounds by a smartphone digital image. Inter-rater agreement coefficients for wound characteristics and treatment plan were calculated within and between the in-person group and the digital image group; the sensitivity and specificity of digital images relative to in-person evaluation were determined. RESULTS:We assessed a total of 80 wounds. Regardless of modality, inter-rater agreement was poor when wounds were evaluated for the presence of ecchymosis and redness; moderate for cellulitis; and high for the presence of a drain, necrosis, or dehiscence. As expected, the presence of drainage was more readily observed in person. Inter-rater agreement was high for both in-person and image-based assessment with respect to course of treatment, with near-perfect agreement for treatments ranging from antibiotics to surgical débridement to hospital readmission. No difference in agreement emerged when raters evaluated poor-quality compared with high-quality images. For most parameters, specificity was higher than sensitivity for image-based compared with "gold standard" in-person assessment. CONCLUSIONS:Using smartphone digital images is a valid method for evaluating postoperative vascular surgery wounds and is comparable to in-person evaluation with regard to most wound characteristics. The inter-rater reliability for determining treatment recommendations was universally high. Remote wound monitoring and assessment may play an integral role in future transitional care models to decrease readmission for SSI in vascular or other surgical patients. These findings will inform smartphone implementation in the clinical care setting as wound images transition from informal clinical communication to becoming part of the care standard.
Project description:BACKGROUND:Annotation and Image Markup on ClearCanvas Enriched Stroke-phenotyping Software (ACCESS) is a novel stand-alone computer software application that allows the creation of simple standardized annotations for reporting brain images of all stroke types. We developed the ACCESS application and determined its inter-rater and intra-rater reliability in the Stroke Investigative Research and Educational Network (SIREN) study to assess its suitability for multicenter studies. METHODS:One hundred randomly selected stroke imaging reports from 5 SIREN sites were re-evaluated by 4 trained independent raters to determine the inter-rater reliability of the ACCESS (version 12.0) software for stroke phenotyping. To determine intra-rater reliability, 6 raters reviewed the same cases previously reported by them after a month of interval. Ischemic stroke was classified using the Oxfordshire Community Stroke Project (OCSP), Trial of Org 10172 in Acute Stroke Treatment (TOAST), and Atherosclerosis, Small-vessel disease, Cardiac source, Other cause (ASCO) protocols, while hemorrhagic stroke was classified using the Structural lesion, Medication, Amyloid angiopathy, Systemic disease, Hypertensive angiopathy and Undetermined (SMASH-U) protocol in ACCESS. Agreement among raters was measured with Cohen's kappa statistics. RESULTS:For primary stroke type, inter-rater agreement was .98 (95% confidence interval [CI], .94-1.00), while intra-rater agreement was 1.00 (95% CI, 1.00). For OCSP subtypes, inter-rater agreement was .97 (95% CI, .92-1.00) for the partial anterior circulation infarcts, .92 (95% CI, .76-1.00) for the total anterior circulation infarcts, and excellent for both lacunar infarcts and posterior circulation infarcts. Intra-rater agreement was .97 (.90-1.00), while inter-rater agreement was .93 (95% CI, .84-1.00) for TOAST subtypes. Inter-rater agreement ranged between .78 (cardioembolic) and .91 (large artery atherosclerotic) for ASCO subtypes and was .80 (95% CI, .56-1.00) for SMASH-U subtypes. CONCLUSION:The ACCESS application facilitates a concordant and reproducible classification of stroke subtypes by multiple investigators, making it suitable for clinical use and multicenter research.