Project description:IntroductionAlthough clinical, functional, and biomarker data predict asthma exacerbations, newer approaches providing high accuracy of prognosis are needed for real-world decision-making in asthma. Machine learning (ML) leverages mathematical and statistical methods to detect patterns for future disease events across large datasets from electronic health records (EHR). This study conducted training and fine-tuning of ML algorithms for the real-world prediction of asthma exacerbations in patients with physician-diagnosed asthma.MethodsAdults with ≥ 2 ICD9/10 asthma codes within 1 year and at least 30 days apart were identified from the Optum Panther EHR database between 2016 and 2023. An emergency department (ED), urgent care, or inpatient visit for asthma, while on systemic administration of corticosteroids, was considered an exacerbation. To predict factors associated with exacerbations in a 6-month study period, clinical information from patients was retrieved in the preceding 6-month baseline period. Clinical information included demographics, lab results, diagnoses, medications, immunizations, and allergies. Three models built using Extreme Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM), and Transformers algorithms were trained and tested on independent datasets. Predictions were explained using the SHAP (SHapley Additive exPlanations) library.ResultsOf 1,331,934 patients with asthma, 16,279 (1.2%) experienced ≥ 1 exacerbation. XGBoost was the best predictive algorithm (area under the curve [AUC] = 0.964). Factors associated with exacerbations included a prior history of exacerbation, prednisone usage, high-dose albuterol usage, and elevated troponin I. Reduced probability of exacerbations was associated with receiving inhaled albuterol, vitamins, aspirin, statins, furosemide, and influenza vaccination.ConclusionThis ML-based study on asthma in the real world confirmed previously known features associated with increased exacerbation risk for asthma, while uncovering not entirely understood features associated with reduced risk of asthma exacerbations. These findings are hypothesis-generating and should contribute to ongoing discussion of the strengths and limitations of ML and other supervised learning models in patient risk stratification.
Project description:There is increasing recognition that asthma and eczema are heterogeneous diseases. We investigated the predictive ability of a spectrum of machine learning methods to disambiguate clinical sub-groups of asthma, wheeze and eczema, using a large heterogeneous set of attributes in an unselected population. The aim was to identify to what extent such heterogeneous information can be combined to reveal specific clinical manifestations.The study population comprised a cross-sectional sample of adults, and included representatives of the general population enriched by subjects with asthma. Linear and non-linear machine learning methods, from logistic regression to random forests, were fit on a large attribute set including demographic, clinical and laboratory features, genetic profiles and environmental exposures. Outcome of interest were asthma, wheeze and eczema encoded by different operational definitions. Model validation was performed via bootstrapping.The study population included 554 adults, 42% male, 38% previous or current smokers. Proportion of asthma, wheeze, and eczema diagnoses was 16.7%, 12.3%, and 21.7%, respectively. Models were fit on 223 non-genetic variables plus 215 single nucleotide polymorphisms. In general, non-linear models achieved higher sensitivity and specificity than other methods, especially for asthma and wheeze, less for eczema, with areas under receiver operating characteristic curve of 84%, 76% and 64%, respectively. Our findings confirm that allergen sensitisation and lung function characterise asthma better in combination than separately. The predictive ability of genetic markers alone is limited. For eczema, new predictors such as bio-impedance were discovered.More usefully-complex modelling is the key to a better understanding of disease mechanisms and personalised healthcare: further advances are likely with the incorporation of more factors/attributes and longitudinal measures.
Project description:BackgroundDistinguishing between non-severe and severe dengue is crucial for timely intervention and reducing morbidity and mortality. World Health Organization (WHO)-recommended warning signs offer a practical approach for clinicians but have limited sensitivity and specificity. This study aims to evaluate machine learning (ML) model performance compared to WHO-recommended warning signs in predicting severe dengue among laboratory-confirmed cases in Puerto Rico.MethodsWe analyzed data from Puerto Rico's Sentinel Enhanced Dengue Surveillance System (May 2012-August 2024), using 40 clinical, demographic, and laboratory variables. Nine ML models, including Decision Trees, K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Artificial Neural Networks, AdaBoost, CatBoost, LightGBM, and XGBoost, were trained using fivefold cross-validation and evaluated with area under the receiver operating characteristic curve (AUC-ROC), sensitivity, and specificity. A subanalysis excluded hemoconcentration and leukopenia to assess performance in resource-limited settings. An AUC-ROC value of 0.5 indicates no discriminative power, while values closer to 1.0 reflect better performance.ResultsAmong the 1708 laboratory-confirmed dengue cases, 24.3% were classified as severe. Gradient boosting algorithms achieved the highest predictive performance, with an AUC-ROC of 97.1% (95% CI: 96.0-98.3%) for CatBoost using the full 40-variable feature set. Feature importance analysis identified hemoconcentration (≥ 20% increase during illness or ≥ 20% above baseline for age and sex), leukopenia (white blood cell count < 4000/mm3), and timing of presentation at 4-6 days post-symptom onset as key predictors. When excluding hemoconcentration and leukopenia, the CatBoost AUC-ROC was 96.7% (95% CI: 95.5-98.0%), demonstrating minimal reduction in performance. Individual warning signs like abdominal pain and restlessness had sensitivities of 79.0% and 64.6%, but lower specificities of 48.4% and 59.1%, respectively. Combining ≥ 3 warning signs improved specificity (80.9%) while maintaining moderate sensitivity (78.6%), resulting in an AUC-ROC of 74.0%.ConclusionsML models, especially gradient boosting algorithms, outperformed traditional warning signs in predicting severe dengue. Integrating these models into clinical decision-support tools could help clinicians better identify high-risk patients, guiding timely interventions like hospitalization, closer monitoring, or the administration of intravenous fluids. The subanalysis excluding hemoconcentration confirmed the models' applicability in resource-limited settings, where access to laboratory data may be limited.
Project description:Asthma in children is a heterogeneous disease manifested by various phenotypes and endotypes. The level of disease control, as well as the effectiveness of anti-inflammatory treatment, is variable and inadequate in a significant portion of patients. By applying machine learning algorithms, we aimed to predict the treatment success in a pediatric asthma cohort and to identify the key variables for understanding the underlying mechanisms. We predicted the treatment outcomes in children with mild to severe asthma (N = 365), according to changes in asthma control, lung function (FEV1 and MEF50) and FENO values after 6 months of controller medication use, using Random Forest and AdaBoost classifiers. The highest prediction power is achieved for control- and, to a lower extent, for FENO-related treatment outcomes, especially in younger children. The most predictive variables for asthma control are related to asthma severity and the total IgE, which were also predictive for FENO-based outcomes. MEF50-related treatment outcomes were better predicted than the FEV1-based response, and one of the best predictive variables for this response was hsCRP, emphasizing the involvement of the distal airways in childhood asthma. Our results suggest that asthma control- and FENO-based outcomes can be more accurately predicted using machine learning than the outcomes according to FEV1 and MEF50. This supports the symptom control-based asthma management approach and its complementary FENO-guided tool in children. T2-high asthma seemed to respond best to the anti-inflammatory treatment. The results of this study in predicting the treatment success will help to enable treatment optimization and to implement the concept of precision medicine in pediatric asthma treatment.
Project description:RationaleMore targeted management of severe acute pediatric asthma could improve clinical outcomes.ObjectivesTo identify distinct clinical phenotypes of severe acute pediatric asthma using variables obtained in the first 12 h of hospitalization.MethodsWe conducted a retrospective cohort study in a quaternary care children's hospital from 2014 to 2022. Encounters for children ages 2-18 years admitted to the hospital for asthma were included. We used consensus k means clustering with patient demographics, vital signs, diagnostics, and laboratory data obtained in the first 12 h of hospitalization.Measurements and main resultsThe study population included 683 encounters divided into derivation (80%) and validation (20%) sets, and two distinct clusters were identified. Compared to Cluster 1 in the derivation set, Cluster 2 encounters (177 [32%]) were older (11 years [8; 14] vs. 5 years [3; 8]; p < .01) and more commonly males (63% vs. 53%; p = .03) of Black race (51% vs. 40%; p = .03) with non-Hispanic ethnicity (96% vs. 84%; p < .01). Cluster 2 encounters had smaller improvements in vital signs at 12-h including percent change in heart rate (-1.7 [-11.7; 12.7] vs. -7.8 [-18.5; 1.7]; p < .01), and respiratory rate (0.0 [-20.0; 22.2] vs. -11.4 [-27.3; 9.0]; p < .01). Encounters in Cluster 2 had lower percentages of neutrophils (70.0 [55.0; 83.0] vs. 85.0 [77.0; 90.0]; p < .01) and higher percentages of lymphocytes (17.0 [8.0; 32.0] vs. 9.0 [5.3; 14.0]; p < .01). Cluster 2 encounters had higher rates of invasive mechanical ventilation (23% vs. 5%; p < .01), longer hospital length of stay (4.5 [2.6; 8.8] vs. 2.9 [2.0; 4.3]; p < .01), and a higher mortality rate (7.3% vs. 0.0%; p < .01). The predicted cluster assignments in the validation set shared the same ratio (~2:1), and many of the same characteristics.ConclusionsWe identified two clinical phenotypes of severe acute pediatric asthma which exhibited distinct clinical features and outcomes.
Project description:Asthma is a common disease with profoundly variable natural history and patient morbidity. Heterogeneity has long been appreciated, and much work has focused on identifying subgroups of patients with similar pathobiological underpinnings. Previous studies of the Severe Asthma Research Program (SARP) cohort linked gene expression changes to specific clinical and physiologic characteristics. While invaluable for hypothesis generation, these data include extensive candidate gene lists that complicate target identification and validation. In this analysis, we performed unsupervised clustering of the SARP cohort using bronchial epithelial cell gene expression data, identifying a transcriptional signature for participants suffering exacerbation-prone asthma with impaired lung function. Clinically, participants in this asthma cluster exhibited a mixed inflammatory process and bore transcriptional hallmarks of NF-κB and activator protein 1 (AP-1) activation, despite high corticosteroid exposure. Using supervised machine learning, we found a set of 31 genes that classified patients with high accuracy and could reconstitute clinical and transcriptional hallmarks of our patient clustering in an external cohort. Of these genes, IL18R1 (IL-18 Receptor 1) negatively associated with lung function and was highly expressed in the most severe patient cluster. We validated IL18R1 protein expression in lung tissue and identified downstream NF-κB and AP-1 activity, supporting IL-18 signaling in severe asthma pathogenesis and highlighting this approach for gene and pathway discovery.
Project description:The aim of this observational retrospective study is to improve early risk stratification of hospitalized Covid-19 patients by predicting in-hospital mortality, transfer to intensive care unit (ICU) and mechanical ventilation from electronic health record data of the first 24 h after admission. Our machine learning model predicts in-hospital mortality (AUC = 0.918), transfer to ICU (AUC = 0.821) and the need for mechanical ventilation (AUC = 0.654) from a few laboratory data of the first 24 h after admission. Models based on dichotomous features indicating whether a laboratory value exceeds or falls below a threshold perform nearly as good as models based on numerical features. We devise completely data-driven and interpretable machine-learning models for the prediction of in-hospital mortality, transfer to ICU and mechanical ventilation for hospitalized Covid-19 patients within 24 h after admission. Numerical values of. CRP and blood sugar and dichotomous indicators for increased partial thromboplastin time (PTT) and glutamic oxaloacetic transaminase (GOT) are amongst the best predictors.
Project description:BackgroundThere are no objective, biological markers that can robustly predict methylphenidate response in attention deficit hyperactivity disorder. This study aimed to examine whether applying machine learning approaches to pretreatment demographic, clinical questionnaire, environmental, neuropsychological, neuroimaging, and genetic information can predict therapeutic response following methylphenidate administration.MethodsThe present study included 83 attention deficit hyperactivity disorder youth. At baseline, parents completed the ADHD Rating Scale-IV and Disruptive Behavior Disorder rating scale, and participants undertook the continuous performance test, Stroop color word test, and resting-state functional MRI scans. The dopamine transporter gene, dopamine D4 receptor gene, alpha-2A adrenergic receptor gene (ADRA2A) and norepinephrine transporter gene polymorphisms, and blood lead and urine cotinine levels were also measured. The participants were enrolled in an 8-week, open-label trial of methylphenidate. Four different machine learning algorithms were used for data analysis.ResultsSupport vector machine classification accuracy was 84.6% (area under receiver operating characteristic curve 0.84) for predicting methylphenidate response. The age, weight, ADRA2A MspI and DraI polymorphisms, lead level, Stroop color word test performance, and oppositional symptoms of Disruptive Behavior Disorder rating scale were identified as the most differentiating subset of features.ConclusionsOur results provide preliminary support to the translational development of support vector machine as an informative method that can assist in predicting treatment response in attention deficit hyperactivity disorder, though further work is required to provide enhanced levels of classification performance.
Project description:BackgroundAlthough inhaled corticosteroids (ICS) are the first-line therapy for patients with persistent asthma, many patients continue to have exacerbations. We developed machine learning models to predict the ICS response in patients with asthma.MethodsThe subjects included asthma patients of European ancestry (n = 1371; 448 children; 916 adults). A genome-wide association study was performed to identify the SNPs associated with ICS response. Using the SNPs identified, two machine learning models were developed to predict ICS response: (1) least absolute shrinkage and selection operator (LASSO) regression and (2) random forest.ResultsThe LASSO regression model achieved an AUC of 0.71 (95% CI 0.67-0.76; sensitivity: 0.57; specificity: 0.75) in an independent test cohort, and the random forest model achieved an AUC of 0.74 (95% CI 0.70-0.78; sensitivity: 0.70; specificity: 0.68). The genes contributing to the prediction of ICS response included those associated with ICS responses in asthma (TPSAB1, FBXL16), asthma symptoms and severity (ABCA7, CNN2, PTRN3, and BSG/CD147), airway remodeling (ELANE, FSTL3), mucin production (GAL3ST), leukotriene synthesis (GPX4), allergic asthma (ZFPM1, SBNO2), and others.ConclusionsAn accurate risk prediction of ICS response can be obtained using machine learning methods, with the potential to inform personalized treatment decisions. Further studies are needed to examine if the integration of richer phenotype data could improve risk prediction.