Project description:ObjectivesAlthough deep learning has demonstrated substantial potential in automatic quantification of joint damage in RA, evidence for detecting longitudinal changes at an individual patient level is lacking. Here, we introduce and externally validate our automated RA scoring algorithm (AuRA), and demonstrate its utility for monitoring radiographic progression in a real-world setting.MethodsThe algorithm, originally developed during the Rheumatoid Arthritis 2-Dialogue for Reverse Engineering Assessment and Methods (RA2-DREAM) challenge, was trained to predict expert-curated Sharp-van der Heijde total scores in hand and foot radiographs from two previous clinical studies (n = 367). We externally validated AuRA against data (n = 205) from Turku University Hospital and compared the performance against two top-performing RA2-DREAM solutions. Finally, for 54 patients, we extracted additional radiograph sets from another control visit to the clinic (average time interval of 4.6 years).ResultsIn the external validation cohort, with a root mean square error (RMSE) of 23.6, AuRA outperformed both top-performing RA2-DREAM algorithms (RMSEs 35.0 and 35.6). The improved performance was explained mostly by lower errors at higher expert-assessed scores. The longitudinal changes predicted by our algorithm were significantly correlated with changes in expert-assessed scores (Pearson's R = 0.74, P < 0.001).ConclusionAuRA had the best external validation performance and demonstrated potential for detecting longitudinal changes in joint damage. Available from https://hub.docker.com/r/elolab/aura, our algorithm can easily be applied for automatic detection of radiographic progression in the future, reducing the need for laborious manual scoring.
Project description:BackgroundFinancial codes are often used to extract diagnoses from electronic health records. This approach is prone to false positives. Alternatively, queries are constructed, but these are highly center and language specific. A tantalizing alternative is the automatic identification of patients by employing machine learning on format-free text entries.ObjectiveThe aim of this study was to develop an easily implementable workflow that builds a machine learning algorithm capable of accurately identifying patients with rheumatoid arthritis from format-free text fields in electronic health records.MethodsTwo electronic health record data sets were employed: Leiden (n=3000) and Erlangen (n=4771). Using a portion of the Leiden data (n=2000), we compared 6 different machine learning methods and a naïve word-matching algorithm using 10-fold cross-validation. Performances were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC), and F1 score was used as the primary criterion for selecting the best method to build a classifying algorithm. We selected the optimal threshold of positive predictive value for case identification based on the output of the best method in the training data. This validation workflow was subsequently applied to a portion of the Erlangen data (n=4293). For testing, the best performing methods were applied to remaining data (Leiden n=1000; Erlangen n=478) for an unbiased evaluation.ResultsFor the Leiden data set, the word-matching algorithm demonstrated mixed performance (AUROC 0.90; AUPRC 0.33; F1 score 0.55), and 4 methods significantly outperformed word-matching, with support vector machines performing best (AUROC 0.98; AUPRC 0.88; F1 score 0.83). Applying this support vector machine classifier to the test data resulted in a similarly high performance (F1 score 0.81; positive predictive value [PPV] 0.94), and with this method, we could identify 2873 patients with rheumatoid arthritis in less than 7 seconds out of the complete collection of 23,300 patients in the Leiden electronic health record system. For the Erlangen data set, gradient boosting performed best (AUROC 0.94; AUPRC 0.85; F1 score 0.82) in the training set, and applied to the test data, resulted once again in good results (F1 score 0.67; PPV 0.97).ConclusionsWe demonstrate that machine learning methods can extract the records of patients with rheumatoid arthritis from electronic health record data with high precision, allowing research on very large populations for limited costs. Our approach is language and center independent and could be applied to any type of diagnosis. We have developed our pipeline into a universally applicable and easy-to-implement workflow to equip centers with their own high-performing algorithm. This allows the creation of observational studies of unprecedented size covering different countries for low cost from already available data in electronic health record systems.
Project description:We developed and independently validated a rheumatoid arthritis (RA) mortality prediction model using the machine learning method Random Survival Forests (RSF). Two independent cohorts from Madrid (Spain) were used: the Hospital Clínico San Carlos RA Cohort (HCSC-RAC; training; 1,461 patients), and the Hospital Universitario de La Princesa Early Arthritis Register Longitudinal study (PEARL; validation; 280 patients). Demographic and clinical-related variables collected during the first two years after disease diagnosis were used. 148 and 21 patients from HCSC-RAC and PEARL died during a median follow-up time of 4.3 and 5.0 years, respectively. Age at diagnosis, median erythrocyte sedimentation rate, and number of hospital admissions showed the higher predictive capacity. Prediction errors in the training and validation cohorts were 0.187 and 0.233, respectively. A survival tree identified five mortality risk groups using the predicted ensemble mortality. After 1 and 7 years of follow-up, time-dependent specificity and sensitivity in the validation cohort were 0.79-0.80 and 0.43-0.48, respectively, using the cut-off value dividing the two lower risk categories. Calibration curves showed overestimation of the mortality risk in the validation cohort. In conclusion, we were able to develop a clinical prediction model for RA mortality using RSF, providing evidence for further work on external validation.
Project description:ObjectiveRecognizing that the interrelationships between chronic conditions that complicate rheumatoid arthritis (RA) are poorly understood, we aimed to identify patterns of multimorbidity and to define their prevalence in RA through machine learning.MethodsWe constructed RA and age- and sex-matched (1:1) non-RA cohorts within a large commercial insurance database (MarketScan) and the Veterans Health Administration (VHA). Chronic conditions (n = 44) were identified from diagnosis codes from outpatient and inpatient encounters. Exploratory factor analysis was performed separately in both databases, stratified by RA diagnosis and sex, to identify multimorbidity patterns. The association of RA with different multimorbidity patterns was determined using conditional logistic regression.ResultsWe studied 226,850 patients in MarketScan (76% female) and 120,780 patients in the VHA (89% male). The primary multimorbidity patterns identified were characterized by the presence of cardiopulmonary, cardiometabolic, and mental health and chronic pain disorders. Multimorbidity patterns were similar between RA and non-RA patients, female and male patients, and patients in MarketScan and the VHA. RA patients had higher odds of each multimorbidity pattern (odds ratios [ORs] 1.17-2.96), with mental health and chronic pain disorders being the multimorbidity pattern most strongly associated with RA (ORs 2.07-2.96).ConclusionCardiopulmonary, cardiometabolic, and mental health and chronic pain disorders represent predominant multimorbidity patterns, each of which is overrepresented in RA. The identification of multimorbidity patterns occurring more frequently in RA is an important first step in progressing toward a holistic approach to RA management and warrants assessment of their clinical and predictive utility.
Project description:Rheumatoid arthritis (RA) is an incurable disease that afflicts 0.5-1.0% of the global population though it is less threatening at its early stage. Therefore, improved diagnostic efficiency and prognostic outcome are critical for confronting RA. Although machine learning is considered a promising technique in clinical research, its potential in verifying the biological significance of gene was not fully exploited. The performance of a machine learning model depends greatly on the features used for model training; therefore, the effectiveness of prediction might reflect the quality of input features. In the present study, we used weighted gene co-expression network analysis (WGCNA) in conjunction with differentially expressed gene (DEG) analysis to select the key genes that were highly associated with RA phenotypes based on multiple microarray datasets of RA blood samples, after which they were used as features in machine learning model validation. A total of six machine learning models were used to validate the biological significance of the key genes based on gene expression, among which five models achieved good performances [area under curve (AUC) >0.85], suggesting that our currently identified key genes are biologically significant and highly representative of genes involved in RA. Combined with other biological interpretations including Gene Ontology (GO) analysis, protein-protein interaction (PPI) network analysis, as well as inference of immune cell composition, our current study might shed a light on the in-depth study of RA diagnosis and prognosis.
Project description:ObjectiveTo develop and validate a composite rheumatoid arthritis (RA) disease activity index using optical spectral transmission (OST) scores obtained with the HandScan, replacing tender and swollen joint counts.MethodsRA patients from a single center routinely undergoing HandScan measurements with at least 1 concurrent OST score and Disease Activity Score in 28 joints (DAS28) were included. Data were extracted from medical records. Linear regression analyses with the DAS28 as the outcome were performed to create a disease activity index (DAS-OST). OST score, erythrocyte sedimentation rate (ESR), and patient global assessment (PtGA) visual analog scale (VAS), sex, age, disease duration, and rheumatoid factor status were evaluated as independent variables. Final models were derived based on the statistical significance of coefficients and model fit. Of the data, two-thirds were used for development and one-third for validation; external validation was performed in a cohort from another center. Agreement between DAS-OST and DAS28 was assessed using the Bland-Altman plot method and intraclass correlation coefficient (ICC). Diagnostic value of the DAS-OST was determined for established definitions of remission, low disease activity (LDA), and high disease activity (HDA).ResultsData of 3,358 observations from 1,505 unique RA patients were extracted. DAS-OST was defined as: -0.44 + OST × 0.03 + male × -0.11 + LN(ESR) × 0.77 + PtGA VAS × 0.03. The ICCs between DAS-OST and DAS28 were 0.88 (95% confidence interval [95% CI] 0.87-0.90) and 0.82 (95% CI 0.75-0.86) and measurement errors were 0.58 and 0.87 in internal and external validation, respectively. Sensitivity for remission, LDA, and HDA was 79%, 91%, and 43%, respectively, and specificity was 92%, 80%, and 96% in external validation.ConclusionUsing the HandScan, RA disease activity can be accurately estimated if combined with ESR, PtGA VAS, and sex into a disease activity index (DAS-OST).
Project description:BackgroundFatigue is a common and burdensome symptom in Rheumatoid Arthritis (RA), yet is poorly understood. Currently, clinicians rely solely on fatigue questionnaires, which are inherently subjective measures. For the effective development of future therapies and stratification, it is of vital importance to identify biomarkers of fatigue. In this study, we identify brain differences between RA patients who improved and did not improve their levels of fatigue based on Chalder Fatigue Scale variation (ΔCFS≥ 2), and we compared the performance of different classifiers to distinguish between these samples at baseline.MethodsFifty-four fatigued RA patients underwent a magnetic resonance (MR) scan at baseline and 6 months later. At 6 months we identified those whose fatigue levels improved and those for whom it did not. More than 900 brain features across three data sets were assessed as potential predictors of fatigue improvement. These data sets included clinical, structural MRI (sMRI) and diffusion tensor imaging (DTI) data. A genetic algorithm was used for feature selection. Three classifiers were employed in the discrimination of improvers and non-improvers of fatigue: a Least Square Linear Discriminant (LSLD), a linear Support Vector Machine (SVM) and a SVM with Radial Basis Function kernel.ResultsThe highest accuracy (67.9%) was achieved with the sMRI set, followed by the DTI set (63.8%), whereas classification performance using clinical features was at the chance level. The mean curvature of the left superior temporal sulcus was most strongly selected during the feature selection step, followed by the surface are of the right frontal pole and the surface area of the left banks of the superior temporal sulcus.ConclusionsThe results presented evidence a superiority of brain metrics over clinical metrics in predicting fatigue changes. Further exploration of these methods may support clinicians to triage patients towards the most appropriate fatigue alleviating therapies.
Project description:BackgroundComorbid conditions are very common in rheumatoid arthritis (RA) and several prior studies have clustered them using machine learning (ML). We applied various ML algorithms to compare the clusters of comorbidities derived and to assess the value of the clusters for predicting future clinical outcomes.MethodsA large US-based RA registry, CorEvitas, was used to identify patients for the analysis. We assessed the presence of 24 comorbidities, and ML was used to derive clusters of patients with given comorbidities. K-mode, K-mean, regression-based, and hierarchical clustering were used. To assess the value of these clusters, we compared clusters across different ML algorithms in clinical outcome models predicting clinical disease activity index (CDAI) and health assessment questionnaire (HAQ-DI). We used data from the first 3 years of the 6-year study period to derive clusters and assess time-averaged values for CDAI and HAQ-DI during the latter 3 years. Model fit was assessed via adjusted R2 and root mean square error for a series of models that included clusters from ML clustering and each of the 24 comorbidities separately.Results11,883 patients with RA were included who had longitudinal data over 6 years. At baseline, patients were on average 59 (SD 12) years of age, 77% were women, CDAI was 11.3 (SD 11.9, moderate disease activity), HAQ-DI was 0.32 (SD 0.42), and disease duration was 10.8 (SD 9.9) years. During the 6 years of follow-up, the percentage of patients with various comorbidities increased. Using five clusters produced by each of the ML algorithms, multivariable regression models with time-averaged CDAI as an outcome found that the ML-derived comorbidity clusters produced similarly strong models as models with each of the 24 separate comorbidities entered individually. The same patterns were observed for HAQ-DI.ConclusionsClustering comorbidities using ML algorithms is not computationally complex but often results in clusters that are difficult to interpret from a clinical standpoint. While ML clustering is useful for modeling multi-omics, using clusters to predict clinical outcomes produces models with a similar fit as those with individual comorbidities.
Project description:Osteoporosis is a serious health concern in patients with rheumatoid arthritis (RA). Machine learning (ML) models have been increasingly incorporated into various clinical practices, including disease classification, risk prediction, and treatment response. However, only a few studies have focused on predicting osteoporosis using ML in patients with RA. We aimed to develop an ML model to predict osteoporosis using a representative Korean RA cohort database. The KORean Observational study Network for Arthritis (KORONA) database, established by the Clinical Research Center for RA in Korea, was used in this study. Among the 5077 patients registered in KORONA, 2374 patients were included in this study. Four representative ML algorithms were used for the prediction: logistic regression (LR), random forest, XGBoost (XGB), and LightGBM. The accuracy, F1 score, and area under the curve (AUC) of each model were measured. The LR model achieved the highest AUC value at 0.750, while the XGB model achieved the highest accuracy at 0.682. Body mass index, age, menopause, waist and hip circumferences, RA surgery, and monthly income were risk factors of osteoporosis. In conclusion, ML algorithms are a useful option for screening for osteoporosis in patients with RA.