Feature engineering with clinical expert knowledge: A case study assessment of machine learning model complexity and performance.
ABSTRACT: Incorporating expert knowledge at the time machine learning models are trained holds promise for producing models that are easier to interpret. The main objectives of this study were to use a feature engineering approach to incorporate clinical expert knowledge prior to applying machine learning techniques, and to assess the impact of the approach on model complexity and performance. Four machine learning models were trained to predict mortality with a severe asthma case study. Experiments to select fewer input features based on a discriminative score showed low to moderate precision for discovering clinically meaningful triplets, indicating that discriminative score alone cannot replace clinical input. When compared to baseline machine learning models, we found a decrease in model complexity with use of fewer features informed by discriminative score and filtering of laboratory features with clinical input. We also found a small difference in performance for the mortality prediction task when comparing baseline ML models to models that used filtered features. Encoding demographic and triplet information in ML models with filtered features appeared to show performance improvements from the baseline. These findings indicated that the use of filtered features may reduce model complexity, and with little impact on performance.
Project description:Prediction models of post-liver transplant mortality are crucial so that donor organs are not allocated to recipients with unreasonably high probabilities of mortality. Machine learning algorithms, particularly deep neural networks (DNNs), can often achieve higher predictive performance than conventional models. In this study, we trained a DNN to predict 90-day post-transplant mortality using preoperative variables and compared the performance to that of the Survival Outcomes Following Liver Transplantation (SOFT) and Balance of Risk (BAR) scores, using United Network of Organ Sharing data on adult patients who received a deceased donor liver transplant between 2005 and 2015 (n = 57,544). The DNN was trained using 202 features, and the best DNN's architecture consisted of 5 hidden layers with 110 neurons each. The area under the receiver operating characteristics curve (AUC) of the best DNN model was 0.703 (95% CI: 0.682-0.726) as compared to 0.655 (95% CI: 0.633-0.678) and 0.688 (95% CI: 0.667-0.711) for the BAR score and SOFT score, respectively. In conclusion, despite the complexity of DNN, it did not achieve a significantly higher discriminative performance than the SOFT score. Future risk models will likely benefit from the inclusion of other data sources, including high-resolution clinical features for which DNNs are particularly apt to outperform conventional statistical methods.
Project description:<b>Purpose:</b> The aim of this study was to investigate the diagnostic value of machine-learning models with radiomic features and clinical features in preoperative differentiation of common lesions located in the anterior skull base. <b>Methods:</b> A total of 235 patients diagnosed with pituitary adenoma, meningioma, craniopharyngioma, or Rathke cleft cyst were enrolled in the current study. The discrimination was divided into three groups: pituitary adenoma vs. craniopharyngioma, meningioma vs. craniopharyngioma, and pituitary adenoma vs. Rathke cleft cyst. In each group, five selection methods were adopted to select suitable features for the classifier, and nine machine-learning classifiers were employed to build discriminative models. The diagnostic performance of each combination was evaluated with area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity calculated for both the training group and the testing group. <b>Results:</b> In each group, several classifiers combined with suitable selection methods represented feasible diagnostic performance with AUC of more than 0.80. Moreover, the combination of least absolute shrinkage and selection operator as the feature-selection method and linear discriminant analysis as the classification algorithm represented the best comprehensive discriminative ability. <b>Conclusion:</b> Radiomics-based machine learning could potentially serve as a novel method to assist in discriminating common lesions in the anterior skull base prior to operation.
Project description:Objectives: To investigate the ability of radiomics features from MRI in differentiating anaplastic oligodendroglioma (AO) from atypical low-grade oligodendroglioma using machine-learning algorithms. Methods: A total number of 101 qualified patients (50 participants with AO and 51 with atypical low-grade oligodendroglioma) were enrolled in this retrospective, single-center study. Forty radiomics features of tumor images derived from six matrices were extracted from contrast-enhanced T1-weighted (T1C) images and fluid-attenuation inversion recovery (FLAIR) images. Three selection methods were performed to select the optimal features for classifiers, including distance correlation, least absolute shrinkage and selection operator (LASSO), and gradient boosting decision tree (GBDT). Then three machine-learning classifiers were adopted to generate discriminative models, including linear discriminant analysis, support vector machine, and random forest (RF). Receiver operating characteristic analysis was conducted to evaluate the discriminative performance of each model. Results: Nine predictive models were established based on radiomics features from T1C images and FLAIR images. All of the classifiers represented feasible ability in differentiation, with AUC more than 0.840 when combined with suitable selection method. For models based on T1C images, the combination of LASSO and RF classifier represented the highest AUC of 0.904 in the validation group. For models based on FLAIR images, the combination of GBDT and RF classifier showed the highest AUC of 0.861 in the validation group. Conclusion: Radiomics-based machine-learning approach could potentially serve as a feasible method in distinguishing AO from atypical low-grade oligodendroglioma.
Project description:<h4>Objectives</h4>To detect unilateral vocal fold paralysis (UVFP) from voice recordings using an explainable model of machine learning.<h4>Study design</h4>Case series - retrospective with a control group.<h4>Methods</h4>Patients with confirmed UVFP through endoscopic examination (N=77) and controls with normal voices matched for age and sex (N=77) were included. Two tasks were used to elicit voice samples: reading the Rainbow Passage and sustaining phonation of the vowel "a". The 88 extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) features were extracted as inputs for four machine learning models of differing complexity. SHAP was used to identify important features.<h4>Results</h4>The median bootstrapped Area Under the Receiver Operating Characteristic Curve (ROC AUC) score ranged from 0.79 to 0.87 depending on model and task. After removing redundant features for explainability, the highest median ROC AUC score was 0.84 using only 13 features for the vowel task and 0.87 using 39 features for the reading task. The most important features included intensity measures, mean MFCC1, mean F1 amplitude and frequency, and shimmer variability depending on model and task.<h4>Conclusion</h4>Using the largest dataset studying UVFP to date, we achieve high performance from just a few seconds of voice recordings. Notably, we demonstrate that while similar categories of features related to vocal fold physiology were conserved across models, the models used different combinations of features and still achieved similar effect sizes. Machine learning thus provides a mechanism to detect UVFP and contextualize the accuracy relative to both model architecture and pathophysiology.
Project description:RATIONALE:Though treatment of the prematurely born infant breathing with assistance of a mechanical ventilator has much advanced in the past decades, predicting extubation outcome at a given point in time remains challenging. Numerous studies have been conducted to identify predictors for extubation outcome; however, the rate of infants failing extubation attempts has not declined. OBJECTIVE:To develop a decision-support tool for the prediction of extubation outcome in premature infants using a set of machine learning algorithms. METHODS:A dataset assembled from 486 premature infants on mechanical ventilation was used to develop predictive models using machine learning algorithms such as artificial neural networks (ANN), support vector machine (SVM), naïve Bayesian classifier (NBC), boosted decision trees (BDT), and multivariable logistic regression (MLR). Performance of all models was evaluated using area under the curve (AUC). RESULTS:For some of the models (ANN, MLR and NBC) results were satisfactory (AUC: 0.63-0.76); however, two algorithms (SVM and BDT) showed poor performance with AUCs of ~0.5. CONCLUSION:Clinician's predictions still outperform machine learning due to the complexity of the data and contextual information that may not be captured in clinical data used as input for the development of the machine learning algorithms. Inclusion of preprocessing steps in future studies may improve the performance of prediction models.
Project description:AIMS:Our aim was to develop a machine learning (ML)-based risk stratification system to predict 1-, 2-, 3-, 4-, and 5-year all-cause mortality from pre-implant parameters of patients undergoing cardiac resynchronization therapy (CRT). METHODS AND RESULTS:Multiple ML models were trained on a retrospective database of 1510 patients undergoing CRT implantation to predict 1- to 5-year all-cause mortality. Thirty-three pre-implant clinical features were selected to train the models. The best performing model [SEMMELWEIS-CRT score (perSonalizEd assessMent of estiMatEd risk of mortaLity With machinE learnIng in patientS undergoing CRT implantation)], along with pre-existing scores (Seattle Heart Failure Model, VALID-CRT, EAARN, ScREEN, and CRT-score), was tested on an independent cohort of 158 patients. There were 805 (53%) deaths in the training cohort and 80 (51%) deaths in the test cohort during the 5-year follow-up period. Among the trained classifiers, random forest demonstrated the best performance. For the prediction of 1-, 2-, 3-, 4-, and 5-year mortality, the areas under the receiver operating characteristic curves of the SEMMELWEIS-CRT score were 0.768 (95% CI: 0.674-0.861; P?<?0.001), 0.793 (95% CI: 0.718-0.867; P?<?0.001), 0.785 (95% CI: 0.711-0.859; P?<?0.001), 0.776 (95% CI: 0.703-0.849; P?<?0.001), and 0.803 (95% CI: 0.733-0.872; P?<?0.001), respectively. The discriminative ability of our model was superior to other evaluated scores. CONCLUSION:The SEMMELWEIS-CRT score (available at semmelweiscrtscore.com) exhibited good discriminative capabilities for the prediction of all-cause death in CRT patients and outperformed the already existing risk scores. By capturing the non-linear association of predictors, the utilization of ML approaches may facilitate optimal candidate selection and prognostication of patients undergoing CRT implantation.
Project description:Recent advances in Quantum Machine Learning (QML) have provided benefits to several computational processes, drastically reducing the time complexity. Another approach of combining quantum information theory with machine learning-without involving quantum computers-is known as Quantum-inspired Machine Learning (QiML), which exploits the expressive power of the quantum language to increase the accuracy of the process (rather than reducing the time complexity). In this work, we propose a large-scale experiment based on the application of a binary classifier inspired by quantum information theory to the biomedical imaging context in clonogenic assay evaluation to identify the most discriminative feature, allowing us to enhance cell colony segmentation. This innovative approach offers a two-fold result: (1) among the extracted and analyzed image features, homogeneity is shown to be a relevant feature in detecting challenging cell colonies; and (2) the proposed quantum-inspired classifier is a novel and outstanding methodology, compared to conventional machine learning classifiers, for the evaluation of clonogenic assays.
Project description:BACKGROUND:Reliably abstracting outcomes from free-text electronic health records remains a challenge. While automated classification of free text has been a popular medical informatics topic, performance validation using real-world clinical data has been limited. The two main approaches are linguistic (natural language processing [NLP]) and statistical (machine learning). The authors have developed a hybrid system for abstracting computed tomography (CT) reports for specified outcomes. OBJECTIVES:The objective was to measure performance of a hybrid NLP and machine learning system for automated outcome classification of emergency department (ED) CT imaging reports. The hypothesis was that such a system is comparable to medical personnel doing the data abstraction. METHODS:A secondary analysis was performed on a prior diagnostic imaging study on 3,710 blunt facial trauma victims. Staff radiologists dictated CT reports as free text, which were then deidentified. A trained data abstractor manually coded the reference standard outcome of acute orbital fracture, with a random subset double-coded for reliability. The data set was randomly split evenly into training and testing sets. Training patient reports were used as input to the Medical Language Extraction and Encoding (MedLEE) NLP tool to create structured output containing standardized medical terms and modifiers for certainty and temporal status. Findings were filtered for low certainty and past/future modifiers and then combined with the manual reference standard to generate decision tree classifiers using data mining tools Waikato Environment for Knowledge Analysis (WEKA) 3.7.5 and Salford Predictive Miner 6.6. Performance of decision tree classifiers was evaluated on the testing set with or without NLP processing. RESULTS:The performance of machine learning alone was comparable to prior NLP studies (sensitivity = 0.92, specificity = 0.93, precision = 0.95, recall = 0.93, f-score = 0.94), and the combined use of NLP and machine learning showed further improvement (sensitivity = 0.93, specificity = 0.97, precision = 0.97, recall = 0.96, f-score = 0.97). This performance is similar to, or better than, that of medical personnel in previous studies. CONCLUSIONS:A hybrid NLP and machine learning automated classification system shows promise in coding free-text electronic clinical data.
Project description:Purpose: The purpose of the current study was to evaluate the ability of magnetic resonance (MR) radiomics-based machine-learning algorithms in differentiating glioblastoma (GBM) from primary central nervous system lymphoma (PCNSL). Method: One-hundred and thirty-eight patients were enrolled in this study. Radiomics features were extracted from contrast-enhanced MR images, and the machine-learning models were established using five selection methods (distance correlation, random forest, least absolute shrinkage and selection operator (LASSO), eXtreme gradient boosting (Xgboost), and Gradient Boosting Decision Tree) and three radiomics-based machine-learning classifiers [linear discriminant analysis (LDA), support vector machine (SVM), and logistic regression (LR)]. Sensitivity, specificity, accuracy, and areas under curves (AUC) of models were calculated, with which the performances of classifiers were evaluated and compared with each other. Result: Brilliant discriminative performance would be observed among all classifiers when combined with the suitable selection method. For LDA-based models, the optimal one was Distance Correlation + LDA with AUC of 0.978. For SVM-based models, Distance Correlation + SVM was the one with highest AUC of 0.959, while for LR-based models, the highest AUC was 0.966 established with LASSO + LR. Conclusion: Radiomics-based machine-learning algorithms potentially have promising performances in differentiating GBM from PCNSL.
Project description:Structural abnormalities in schizophrenia (SZ) patients have been well documented with structural magnetic resonance imaging (MRI) data using voxel-based morphometry (VBM) and region of interest (ROI) analyses. However, these analyses can only detect group-wise differences and thus, have a poor predictive value for individuals. In the present study, we applied a machine learning method that combined support vector machine (SVM) with recursive feature elimination (RFE) to discriminate SZ patients from normal controls (NCs) using their structural MRI data. We first employed both VBM and ROI analyses to compare gray matter volume (GMV) and white matter volume (WMV) between 41 SZ patients and 42 age- and sex-matched NCs. The method of SVM combined with RFE was used to discriminate SZ patients from NCs using significant between-group differences in both GMV and WMV as input features. We found that SZ patients showed GM and WM abnormalities in several brain structures primarily involved in the emotion, memory, and visual systems. An SVM with a RFE classifier using the significant structural abnormalities identified by the VBM analysis as input features achieved the best performance (an accuracy of 88.4%, a sensitivity of 91.9%, and a specificity of 84.4%) in the discriminative analyses of SZ patients. These results suggested that distinct neuroanatomical profiles associated with SZ patients might provide a potential biomarker for disease diagnosis, and machine-learning methods can reveal neurobiological mechanisms in psychiatric diseases.