Project description:BackgroundFinancial codes are often used to extract diagnoses from electronic health records. This approach is prone to false positives. Alternatively, queries are constructed, but these are highly center and language specific. A tantalizing alternative is the automatic identification of patients by employing machine learning on format-free text entries.ObjectiveThe aim of this study was to develop an easily implementable workflow that builds a machine learning algorithm capable of accurately identifying patients with rheumatoid arthritis from format-free text fields in electronic health records.MethodsTwo electronic health record data sets were employed: Leiden (n=3000) and Erlangen (n=4771). Using a portion of the Leiden data (n=2000), we compared 6 different machine learning methods and a naïve word-matching algorithm using 10-fold cross-validation. Performances were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC), and F1 score was used as the primary criterion for selecting the best method to build a classifying algorithm. We selected the optimal threshold of positive predictive value for case identification based on the output of the best method in the training data. This validation workflow was subsequently applied to a portion of the Erlangen data (n=4293). For testing, the best performing methods were applied to remaining data (Leiden n=1000; Erlangen n=478) for an unbiased evaluation.ResultsFor the Leiden data set, the word-matching algorithm demonstrated mixed performance (AUROC 0.90; AUPRC 0.33; F1 score 0.55), and 4 methods significantly outperformed word-matching, with support vector machines performing best (AUROC 0.98; AUPRC 0.88; F1 score 0.83). Applying this support vector machine classifier to the test data resulted in a similarly high performance (F1 score 0.81; positive predictive value [PPV] 0.94), and with this method, we could identify 2873 patients with rheumatoid arthritis in less than 7 seconds out of the complete collection of 23,300 patients in the Leiden electronic health record system. For the Erlangen data set, gradient boosting performed best (AUROC 0.94; AUPRC 0.85; F1 score 0.82) in the training set, and applied to the test data, resulted once again in good results (F1 score 0.67; PPV 0.97).ConclusionsWe demonstrate that machine learning methods can extract the records of patients with rheumatoid arthritis from electronic health record data with high precision, allowing research on very large populations for limited costs. Our approach is language and center independent and could be applied to any type of diagnosis. We have developed our pipeline into a universally applicable and easy-to-implement workflow to equip centers with their own high-performing algorithm. This allows the creation of observational studies of unprecedented size covering different countries for low cost from already available data in electronic health record systems.

Project description:ObjectiveTo develop and validate a composite rheumatoid arthritis (RA) disease activity index using optical spectral transmission (OST) scores obtained with the HandScan, replacing tender and swollen joint counts.MethodsRA patients from a single center routinely undergoing HandScan measurements with at least 1 concurrent OST score and Disease Activity Score in 28 joints (DAS28) were included. Data were extracted from medical records. Linear regression analyses with the DAS28 as the outcome were performed to create a disease activity index (DAS-OST). OST score, erythrocyte sedimentation rate (ESR), and patient global assessment (PtGA) visual analog scale (VAS), sex, age, disease duration, and rheumatoid factor status were evaluated as independent variables. Final models were derived based on the statistical significance of coefficients and model fit. Of the data, two-thirds were used for development and one-third for validation; external validation was performed in a cohort from another center. Agreement between DAS-OST and DAS28 was assessed using the Bland-Altman plot method and intraclass correlation coefficient (ICC). Diagnostic value of the DAS-OST was determined for established definitions of remission, low disease activity (LDA), and high disease activity (HDA).ResultsData of 3,358 observations from 1,505 unique RA patients were extracted. DAS-OST was defined as: -0.44 + OST × 0.03 + male × -0.11 + LN(ESR) × 0.77 + PtGA VAS × 0.03. The ICCs between DAS-OST and DAS28 were 0.88 (95% confidence interval [95% CI] 0.87-0.90) and 0.82 (95% CI 0.75-0.86) and measurement errors were 0.58 and 0.87 in internal and external validation, respectively. Sensitivity for remission, LDA, and HDA was 79%, 91%, and 43%, respectively, and specificity was 92%, 80%, and 96% in external validation.ConclusionUsing the HandScan, RA disease activity can be accurately estimated if combined with ESR, PtGA VAS, and sex into a disease activity index (DAS-OST).

Project description:BackgroundFatigue is a common and burdensome symptom in Rheumatoid Arthritis (RA), yet is poorly understood. Currently, clinicians rely solely on fatigue questionnaires, which are inherently subjective measures. For the effective development of future therapies and stratification, it is of vital importance to identify biomarkers of fatigue. In this study, we identify brain differences between RA patients who improved and did not improve their levels of fatigue based on Chalder Fatigue Scale variation (ΔCFS≥ 2), and we compared the performance of different classifiers to distinguish between these samples at baseline.MethodsFifty-four fatigued RA patients underwent a magnetic resonance (MR) scan at baseline and 6 months later. At 6 months we identified those whose fatigue levels improved and those for whom it did not. More than 900 brain features across three data sets were assessed as potential predictors of fatigue improvement. These data sets included clinical, structural MRI (sMRI) and diffusion tensor imaging (DTI) data. A genetic algorithm was used for feature selection. Three classifiers were employed in the discrimination of improvers and non-improvers of fatigue: a Least Square Linear Discriminant (LSLD), a linear Support Vector Machine (SVM) and a SVM with Radial Basis Function kernel.ResultsThe highest accuracy (67.9%) was achieved with the sMRI set, followed by the DTI set (63.8%), whereas classification performance using clinical features was at the chance level. The mean curvature of the left superior temporal sulcus was most strongly selected during the feature selection step, followed by the surface are of the right frontal pole and the surface area of the left banks of the superior temporal sulcus.ConclusionsThe results presented evidence a superiority of brain metrics over clinical metrics in predicting fatigue changes. Further exploration of these methods may support clinicians to triage patients towards the most appropriate fatigue alleviating therapies.

Project description:BackgroundComorbid conditions are very common in rheumatoid arthritis (RA) and several prior studies have clustered them using machine learning (ML). We applied various ML algorithms to compare the clusters of comorbidities derived and to assess the value of the clusters for predicting future clinical outcomes.MethodsA large US-based RA registry, CorEvitas, was used to identify patients for the analysis. We assessed the presence of 24 comorbidities, and ML was used to derive clusters of patients with given comorbidities. K-mode, K-mean, regression-based, and hierarchical clustering were used. To assess the value of these clusters, we compared clusters across different ML algorithms in clinical outcome models predicting clinical disease activity index (CDAI) and health assessment questionnaire (HAQ-DI). We used data from the first 3 years of the 6-year study period to derive clusters and assess time-averaged values for CDAI and HAQ-DI during the latter 3 years. Model fit was assessed via adjusted R2 and root mean square error for a series of models that included clusters from ML clustering and each of the 24 comorbidities separately.Results11,883 patients with RA were included who had longitudinal data over 6 years. At baseline, patients were on average 59 (SD 12) years of age, 77% were women, CDAI was 11.3 (SD 11.9, moderate disease activity), HAQ-DI was 0.32 (SD 0.42), and disease duration was 10.8 (SD 9.9) years. During the 6 years of follow-up, the percentage of patients with various comorbidities increased. Using five clusters produced by each of the ML algorithms, multivariable regression models with time-averaged CDAI as an outcome found that the ML-derived comorbidity clusters produced similarly strong models as models with each of the 24 separate comorbidities entered individually. The same patterns were observed for HAQ-DI.ConclusionsClustering comorbidities using ML algorithms is not computationally complex but often results in clusters that are difficult to interpret from a clinical standpoint. While ML clustering is useful for modeling multi-omics, using clusters to predict clinical outcomes produces models with a similar fit as those with individual comorbidities.

Dataset Information

External Validation of the Machine Learning-Based Thermographic Indices for Rheumatoid Arthritis: A Prospective Longitudinal Study

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets