Dataset Information

Developing and Validating a Survival Prediction Model for NSCLC Patients Through Distributed Learning Across 3 Countries.

ABSTRACT:

Purpose

Tools for survival prediction for non-small cell lung cancer (NSCLC) patients treated with chemoradiation or radiation therapy are of limited quality. In this work, we developed a predictive model of survival at 2 years. The model is based on a large volume of historical patient data and serves as a proof of concept to demonstrate the distributed learning approach.

Methods and materials

Clinical data from 698 lung cancer patients, treated with curative intent with chemoradiation or radiation therapy alone, were collected and stored at 2 different cancer institutes (559 patients at Maastro clinic (Netherlands) and 139 at Michigan university [United States]). The model was further validated on 196 patients originating from The Christie (United Kingdon). A Bayesian network model was adapted for distributed learning (the animation can be viewed at https://www.youtube.com/watch?v=ZDJFOxpwqEA). Two-year posttreatment survival was chosen as the endpoint. The Maastro clinic cohort data are publicly available at https://www.cancerdata.org/publication/developing-and-validating-survival-prediction-model-nsclc-patients-through-distributed, and the developed models can be found at www.predictcancer.org.

Results

Variables included in the final model were T and N category, age, performance status, and total tumor dose. The model has an area under the curve (AUC) of 0.66 on the external validation set and an AUC of 0.62 on a 5-fold cross validation. A model based on the T and N category performed with an AUC of 0.47 on the validation set, significantly worse than our model (P<.001). Learning the model in a centralized or distributed fashion yields a minor difference on the probabilities of the conditional probability tables (0.6%); the discriminative performance of the models on the validation set is similar (P=.26).

Conclusions

Distributed learning from federated databases allows learning of predictive models on data originating from multiple institutions while avoiding many of the data-sharing barriers. We believe that distributed learning is the future of sharing data in health care.

SUBMITTER: Jochems A

PROVIDER: S-EPMC5575360 | biostudies-literature | 2017 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Developing and Validating a Survival Prediction Model for NSCLC Patients Through Distributed Learning Across 3 Countries.

Jochems Arthur A Deist Timo M TM El Naqa Issam I Kessler Marc M Mayo Chuck C Reeves Jackson J Jolly Shruti S Matuszak Martha M Ten Haken Randall R van Soest Johan J Oberije Cary C Faivre-Finn Corinne C Price Gareth G de Ruysscher Dirk D Lambin Philippe P Dekker Andre A

International journal of radiation oncology, biology, physics 20170424 2

<h4>Purpose</h4>Tools for survival prediction for non-small cell lung cancer (NSCLC) patients treated with chemoradiation or radiation therapy are of limited quality. In this work, we developed a predictive model of survival at 2 years. The model is based on a large volume of historical patient data and serves as a proof of concept to demonstrate the distributed learning approach.<h4>Methods and materials</h4>Clinical data from 698 lung cancer patients, treated with curative intent with chemorad ...[more]

PMID: 28871984

Similar Datasets

Project description:BackgroundRapid identification of high-risk polytrauma patients is crucial for early intervention and improved outcomes. This study aimed to develop and validate machine learning models for predicting 72 h mortality in adult polytrauma patients using readily available clinical parameters.MethodsA retrospective analysis was conducted on polytrauma patients from the Dryad database and our institution. Missing values pertinent to eligible individuals within the Dryad database were compensated for through the k-nearest neighbor algorithm, subsequently randomizing them into training and internal validation factions on a 7:3 ratio. The patients of our institution functioned as external validation cohorts. The predictive efficacy of random forest (RF), neural network, and XGBoost models was assessed through an exhaustive suite of performance indicators. The SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) methods were engaged to explain the supreme-performing model. Conclusively, restricted cubic spline analysis and multivariate logistic regression were employed as sensitivity analyses to verify the robustness of the findings.ResultsParameters including age, body mass index, Glasgow Coma Scale, Injury Severity Score, pH, base excess, and lactate emerged as pivotal predictors of 72 h mortality. The RF model exhibited unparalleled performance, boasting an area under the receiver operating characteristic curve (AUROC) of 0.87 (95% confidence interval [CI] 0.84-0.89), an area under the precision-recall curve (AUPRC) of 0.67 (95% CI 0.61-0.73), and an accuracy of 0.83 (95% CI 0.81-0.86) in the internal validation cohort, paralleled by an AUROC of 0.98 (95% CI 0.97-0.99), an AUPRC of 0.88 (95% CI 0.83-0.93), and an accuracy of 0.97 (95% CI 0.96-0.98) in the external validation cohort. It provided the highest net benefit in the decision curve analysis in relation to the other models. The outcomes of the sensitivity examinations were congruent with those inferred from SHAP and LIME.ConclusionsThe RF model exhibited the best performance in predicting 72 h mortality in adult polytrauma patients and has the potential to aid clinicians in identifying high-risk patients and guiding clinical decision-making.

Project description:ObjectivesThe retinal age gap (RAG) is emerging as a potential biomarker for various diseases of the human body, yet its utility depends on machine learning models capable of accurately predicting biological retinal age from fundus images. However, training generalizable models is hindered by potential shortages of diverse training data. To overcome these obstacles, this work develops a novel and computationally efficient distributed learning framework for retinal age prediction.Materials and methodsThe proposed framework employs a memory-efficient 8-bit quantized version of RETFound, a cutting-edge foundation model for retinal image analysis, to extract features from fundus images. These features are then used to train an efficient linear regression head model for predicting retinal age. The framework explores federated learning (FL) as well as traveling model (TM) approaches for distributed training of the linear regression head. To evaluate this framework, we simulate a client network using fundus image data from the UK Biobank. Additionally, data from patients with type 1 diabetes from the UK Biobank and the Brazilian Multilabel Ophthalmological Dataset (BRSET) were utilized to explore the clinical utility of the developed methods.ResultsOur findings reveal that the developed distributed learning framework achieves retinal age prediction performance on par with centralized methods, with FL and TM providing similar performance (mean absolute error of 3.57 ± 0.18 years for centralized learning, 3.60 ± 0.16 years for TM, and 3.63 ± 0.19 years for FL). Notably, the TM was found to converge with fewer local updates than FL. Moreover, patients with type 1 diabetes exhibited significantly higher RAG values than healthy controls in all models, for both the UK Biobank and BRSET datasets (P < .001).DiscussionThe high computational and memory efficiency of the developed distributed learning framework makes it well suited for resource-constrained environments.ConclusionThe capacity of this framework to integrate data from underrepresented populations for training of retinal age prediction models could significantly enhance the accessibility of the RAG as an important disease biomarker.

Project description:BackgroundAt present, preoperative diagnosis of lateral cervical lymph node metastasis (LLNM) in patients with papillary thyroid carcinoma (PTC) mostly depends on the training and expertise of ultrasound doctors. A machine-learning model for predicting LLNM accurately before PTC surgery may help to determine the scope of surgery and reduce unnecessary surgical trauma.MethodsThe data of patients with primary PTC who underwent thyroidectomy with lateral cervical lymph node surgery at Beijing Tongren Hospital between July 2009 and June 2021 were retrospectively analyzed. All patients had complete ultrasonic examination, clinical data, and definite pathology diagnosis of lymph nodes. LLNM was confirmed by postoperative pathology. The patients were randomly divided into a training set (155 cases) and a test set (98 cases) at a ratio of 6:4. Eleven parameters, including patient demographics, ultrasound results, and tumor-related conditions, were collected, and a prediction model was established using the support vector machine (SVM) algorithm. Several other machine-learning algorithms were also used to establish models for comparison. The accuracy, precision, recall, F1-score, sensitivity, specificity, Cohen's kappa value, and area under the receiver operating characteristic curve (AUC) were used to evaluate model performance.ResultsA total of 87 males and 156 females were included in the study, aged 14-80 years. One hundred and four patients of them had LLNM and 139 did not have LLNM. The pandas Python library was used for the statistical analysis, and the Spearman coefficient was used to analyze the correlation between each parameter and the prediction index. The SVM model performed the best among all the models. Its accuracy, precision, recall, F1-score, sensitivity, specificity, Cohen's kappa value, and AUC were 90.8%, 91.0%, 90.8%, 90.8%, 87.5%, 94.0%, 81.6%, and 91.0%, respectively.ConclusionsThis model can enable surgeons to improve the accuracy of ultrasonography in predicting LLNM without additional examination, thus avoiding missing positive lateral cervical lymph nodes and reducing the sequelae caused by unnecessary lateral neck dissection.

Project description:Rationale & objectiveRisk factors for acute kidney injury (AKI) in the hospital have been well studied. Yet, risk factors for identifying high-risk patients for AKI occurring and managed in the outpatient setting are unknown and may differ.Study designPredictive model development and external validation using observational electronic health record data.Setting & participantsPatients aged 18-90 years with recurrent primary care encounters, known baseline serum creatinine, and creatinine measured during an 18-month outcome period without established advanced kidney disease.New predictors & established predictorsEstablished predictors for inpatient AKI were considered. Potential new predictors were hospitalization history, smoking, serum potassium levels, and prior outpatient AKI.OutcomesA ≥50% increase in the creatinine level above a moving baseline of the recent measurement(s) without a hospital admission within 7 days defined outpatient AKI.Analytical approachLogistic regression with bootstrap sampling for backward stepwise covariate elimination was used. The model was then transformed into 2 binary tests: one identifying high-risk patients for research and another identifying patients for additional clinical monitoring or intervention.ResultsOutpatient AKI was observed in 4,611 (3.0%) and 115,744 (2.4%) patients in the development and validation cohorts, respectively. The model, with 18 variables and 3 interaction terms, produced C statistics of 0.717 (95% CI, 0.710-0.725) and 0.722 (95% CI, 0.720-0.723) in the development and validation cohorts, respectively. The research test, identifying the 5.2% most at-risk patients in the validation cohort, had a sensitivity of 0.210 (95% CI, 0.208-0.213) and specificity of 0.952 (95% CI, 0.951-0.952). The clinical test, identifying the 20% most at-risk patients, had a sensitivity of 0.494 (95% CI, 0.491-0.497) and specificity of 0.806 (95% CI, 0.806-0.807).LimitationsOnly surviving patients with measured creatinine levels during a baseline period and outcome period were included.ConclusionsThe outpatient AKI risk prediction model performed well in both the development and validation cohorts in both continuous and binary forms.

Dataset Information

Developing and Validating a Survival Prediction Model for NSCLC Patients Through Distributed Learning Across 3 Countries.

Purpose

Methods and materials

Results

Conclusions

Publications

Developing and Validating a Survival Prediction Model for NSCLC Patients Through Distributed Learning Across 3 Countries.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets