Dataset Information

External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning.

ABSTRACT:

Background

An external control arm is a cohort of control patients that are collected from data external to a single-arm trial. To provide an unbiased estimation of efficacy, the clinical profiles of patients from single and external arms should be aligned, typically using propensity score approaches. There are alternative approaches to infer efficacy based on comparisons between outcomes of single-arm patients and machine-learning predictions of control patient outcomes. These methods include G-computation and Doubly Debiased Machine Learning (DDML) and their evaluation for External Control Arms (ECA) analysis is insufficient.

Methods

We consider both numerical simulations and a trial replication procedure to evaluate the different statistical approaches: propensity score matching, Inverse Probability of Treatment Weighting (IPTW), G-computation, and DDML. The replication study relies on five type 2 diabetes randomized clinical trials granted by the Yale University Open Data Access (YODA) project. From the pool of five trials, observational experiments are artificially built by replacing a control arm from one trial by an arm originating from another trial and containing similarly-treated patients.

Results

Among the different statistical approaches, numerical simulations show that DDML has the smallest bias followed by G-computation. In terms of mean squared error, G-computation usually minimizes mean squared error. Compared to other methods, DDML has varying Mean Squared Error performances that improves with increasing sample sizes. For hypothesis testing, all methods control type I error and DDML is the most conservative. G-computation is the best method in terms of statistical power, and DDML has comparable power at [Formula: see text] but inferior ones for smaller sample sizes. The replication procedure also indicates that G-computation minimizes mean squared error whereas DDML has intermediate performances in between G-computation and propensity score approaches. The confidence intervals of G-computation are the narrowest whereas confidence intervals obtained with DDML are the widest for small sample sizes, which confirms its conservative nature.

Conclusions

For external control arm analyses, methods based on outcome prediction models can reduce estimation error and increase statistical power compared to propensity score approaches.

SUBMITTER: Loiseau N

PROVIDER: S-EPMC9795588 | biostudies-literature | 2022 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning.

Loiseau Nicolas N Trichelair Paul P He Maxime M Andreux Mathieu M Zaslavskiy Mikhail M Wainrib Gilles G Blum Michael G B MGB

BMC medical research methodology 20221228 1

<h4>Background</h4>An external control arm is a cohort of control patients that are collected from data external to a single-arm trial. To provide an unbiased estimation of efficacy, the clinical profiles of patients from single and external arms should be aligned, typically using propensity score approaches. There are alternative approaches to infer efficacy based on comparisons between outcomes of single-arm patients and machine-learning predictions of control patient outcomes. These methods i ...[more]

PMID: 36577946

Similar Datasets

Project description:BackgroundSepsis-induced myocardial injury (SIMI) is a severe and common complication of sepsis; However, its definition remains unclear. Prognostic analyses may vary depending on the definition applied. Early prediction of SIMI is crucial for timely intervention, ultimately improving patient outcomes. This study aimed to evaluate the prognostic impact of SIMI and develop validated predictive models using advanced machine learning (ML) algorithms for identifying SIMI in critically ill sepsis patients.MethodsData were sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV, v3.0) database. Patients meeting Sepsis-3.0 criteria were included, and SIMI was defined as troponin T (TNT) levels ≥0.1 ng/mL. Prognostic evaluation involved propensity score matching, inverse probability weighting, doubly robust analysis, logistic regression, and Cox regression. Patients were divided into training and testing datasets in a 7:3 ratio. Least absolute shrinkage and selection operator (LASSO) regression was used for variable selection to simplify the model. Twelve hyperparameter-tuned ML models were developed and evaluated using visualized heatmaps. The best-performing model was deployed as a web-based application.ResultsAmong 2,435 patients analyzed, 571 (23.45%) developed SIMI following intensive care unit (ICU) admission. Boruta and LASSO identified 46 and 10 key variables, respectively, for prognostic and predictive modeling. Doubly robust analysis revealed significantly worse short- and intermediate-term outcomes for SIMI patients, including increased in-ICU mortality [odds ratio (OR) 1.39, 95% confidence interval (CI) 1.02-1.85, p < 0.05], 28-day mortality (OR 1.35, 95% CI 1.02-1.79, p < 0.05), and 180-day mortality [hazard ratio (HR) 1.21, 95% CI 1.01-1.44, p < 0.05]. However, one-year mortality showed no significant difference (HR 1.03, 95% CI 0.99-1.08, p = 0.169). The XGBoost model outperformed others, achieving an area under the receiver operating characteristic curve (AUROC) of 0.83 (95% CI 0.79-0.87). SHapley Additive exPlanations (SHAP) analysis highlighted the top five predictive features: creatine kinase-myocardial band (CKMB), creatinine, alanine aminotransferase (ALT), lactate, and blood urea nitrogen (BUN). A web-based application was subsequently developed for real-world use.ConclusionSIMI significantly worsens patient prognosis, while the XGBoost model demonstrated excellent predictive performance. The development of a web-based application provides clinicians with a practical tool for timely intervention, potentially improving outcomes for septic patients.

Project description:ImportanceAcute kidney injury (AKI) is associated with increased morbidity and mortality in hospitalized patients. Current methods to identify patients at high risk of AKI are limited, and few prediction models have been externally validated.ObjectiveTo internally and externally validate a machine learning risk score to detect AKI in hospitalized patients.Design, setting, and participantsThis diagnostic study included 495 971 adult hospital admissions at the University of Chicago (UC) from 2008 to 2016 (n = 48 463), at Loyola University Medical Center (LUMC) from 2007 to 2017 (n = 200 613), and at NorthShore University Health System (NUS) from 2006 to 2016 (n = 246 895) with serum creatinine (SCr) measurements. Patients with an SCr concentration at admission greater than 3.0 mg/dL, with a prior diagnostic code for chronic kidney disease stage 4 or higher, or who received kidney replacement therapy within 48 hours of admission were excluded. A simplified version of a previously published gradient boosted machine AKI prediction algorithm was used; it was validated internally among patients at UC and externally among patients at NUS and LUMC.Main outcomes and measuresPrediction of Kidney Disease Improving Global Outcomes SCr-defined stage 2 AKI within a 48-hour interval was the primary outcome. Discrimination was assessed by the area under the receiver operating characteristic curve (AUC).ResultsThe study included 495 971 adult admissions (mean [SD] age, 63 [18] years; 87 689 [17.7%] African American; and 266 866 [53.8%] women) across 3 health systems. The development of stage 2 or higher AKI occurred in 15 664 of 48 463 patients (3.4%) in the UC cohort, 5711 of 200 613 (2.8%) in the LUMC cohort, and 3499 of 246 895 (1.4%) in the NUS cohort. In the UC cohort, 332 patients (0.7%) required kidney replacement therapy compared with 672 patients (0.3%) in the LUMC cohort and 440 patients (0.2%) in the NUS cohort. The AUCs for predicting at least stage 2 AKI in the next 48 hours were 0.86 (95% CI, 0.86-0.86) in the UC cohort, 0.85 (95% CI, 0.84-0.85) in the LUMC cohort, and 0.86 (95% CI, 0.86-0.86) in the NUS cohort. The AUCs for receipt of kidney replacement therapy within 48 hours were 0.96 (95% CI, 0.96-0.96) in the UC cohort, 0.95 (95% CI, 0.94-0.95) in the LUMC cohort, and 0.95 (95% CI, 0.94-0.95) in the NUS cohort. In time-to-event analysis, a probability cutoff of at least 0.057 predicted the onset of stage 2 AKI a median (IQR) of 27 (6.5-93) hours before the eventual doubling in SCr concentrations in the UC cohort, 34.5 (19-85) hours in the NUS cohort, and 39 (19-108) hours in the LUMC cohort.Conclusions and relevanceIn this study, the machine learning algorithm demonstrated excellent discrimination in both internal and external validation, supporting its generalizability and potential as a clinical decision support tool to improve AKI detection and outcomes.

Project description:A major shortcoming of semiempirical (SE) molecular orbital methods is their severe underestimation of molecular polarizability compared with experimental and ab initio (AI) benchmark data. In a combined quantum mechanical and molecular mechanical (QM/MM) treatment of solution-phase reactions, solute described by SE methods therefore tends to generate inadequate electronic polarization response to solvent electric fields, which often leads to large errors in free energy profiles. To address this problem, here we present a hybrid framework that improves the response property of SE/MM methods through high-level molecular-polarizability fitting. Specifically, we place on QM atoms a set of corrective polarizabilities (referred to as chaperone polarizabilities), whose magnitudes are determined from machine learning (ML) to reproduce the condensed-phase AI molecular polarizability along the minimum free energy path. These chaperone polarizabilities are then used in a machinery similar to a polarizable force field calculation to compensate for the missing polarization energy in the conventional SE/MM simulations. Because QM atoms in this treatment host SE wave functions as well as classical polarizabilities, both polarized by MM electric fields, we name this method doubly polarized QM/MM (dp-QM/MM). We demonstrate the new method on the free energy simulations of the Menshutkin reaction in water. Using AM1/MM as a base method, we show that ML chaperones greatly reduce the error in the solute molecular polarizability from 6.78 to 0.03 Å3 with respect to the density functional theory benchmark. The chaperone correction leads to ∼10 kcal/mol of additional polarization energy in the product region, bringing the simulated free energy profiles to closer agreement with the experimental results. Furthermore, the solute-solvent radial distribution functions show that the chaperone polarizabilities modify the free energy profiles through enhanced solvation corrections when the system evolves from the charge-neutral reactant state to the charge-separated transition and product states. These results suggest that the dp-QM/MM method, enabled by ML chaperone polarizabilities, provides a very physical remedy for the underpolarization problem in SE/MM-based free energy simulations.

Dataset Information

External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning.

Background

Methods

Results

Conclusions

Publications

External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets