Dataset Information

Prediction Model Performance With Different Imputation Strategies: A Simulation Study Using a North American ICU Registry.

ABSTRACT:

Objectives

To evaluate the performance of pragmatic imputation approaches when estimating model coefficients using datasets with varying degrees of data missingness.

Design

Performance in predicting observed mortality in a registry dataset was evaluated using simulations of two simple logistic regression models with age-specific criteria for abnormal vital signs (mentation, systolic blood pressure, respiratory rate, WBC count, heart rate, and temperature). Starting with a dataset with complete information, increasing degrees of biased missingness of WBC and mentation were introduced, depending on the values of temperature and systolic blood pressure, respectively. Missing data approaches evaluated included analysis of complete cases only, assuming missing data are normal, and multiple imputation by chained equations. Percent bias and root mean square error, in relation to parameter estimates obtained from the original data, were evaluated as performance indicators.

Setting

Data were obtained from the Virtual Pediatric Systems, LLC, database (Los Angeles, CA), which provides clinical markers and outcomes in prospectively collected records from 117 PICUs in the United States and Canada.

Patients

Children admitted to a participating PICU in 2017, for whom all required data were available.

Interventions

None.

Measurements and main results

Simulations demonstrated that multiple imputation by chained equations is an effective strategy and that even a naive implementation of multiple imputation by chained equations significantly outperforms traditional approaches: the root mean square error for model coefficients was lower using multiple imputation by chained equations in 90 of 99 of all simulations (91%) compared with discarding cases with missing data and lower in 97 of 99 (98%) compared with models assuming missing values are in the normal range. Assuming missing data to be abnormal was inferior to all other approaches.

Conclusions

Analyses of large observational studies are likely to encounter the issue of missing data, which are likely not missing at random. Researchers should always consider multiple imputation by chained equations (or similar imputation approaches) when encountering even only small proportions of missing data in their work.

SUBMITTER: Steif J

PROVIDER: S-EPMC8719509 | biostudies-literature | 2022 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Prediction Model Performance With Different Imputation Strategies: A Simulation Study Using a North American ICU Registry.

Steif Jonathan J Brant Rollin R Sreepada Rama Syamala RS West Nicholas N Murthy Srinivas S Görges Matthias M

Pediatric critical care medicine : a journal of the Society of Critical Care Medicine and the World Federation of Pediatric Intensive and Critical Care Societies 20220101 1

<h4>Objectives</h4>To evaluate the performance of pragmatic imputation approaches when estimating model coefficients using datasets with varying degrees of data missingness.<h4>Design</h4>Performance in predicting observed mortality in a registry dataset was evaluated using simulations of two simple logistic regression models with age-specific criteria for abnormal vital signs (mentation, systolic blood pressure, respiratory rate, WBC count, heart rate, and temperature). Starting with a dataset ...[more]

PMID: 34560774

Similar Datasets

Project description:Sudden unexpected death in epilepsy is the leading category of epilepsy-related death and the underlying mechanisms are incompletely understood. Risk factors can include a recent history and high frequency of generalized tonic-clonic seizures, which can depress brain activity postictally, impairing respiration, arousal and protective reflexes. Neuropathological findings in sudden unexpected death in epilepsy cases parallel those in other epilepsy patients, with no implication of novel structures or mechanisms in seizure-related deaths. Few large studies have comprehensively reviewed whole brain examination of such patients. We evaluated 92 North American Sudden unexpected death in epilepsy Registry cases with whole brain neuropathological examination by board-certified neuropathologists blinded to the adjudicated cause of death, with an average of 16 brain regions examined per case. The 92 cases included 61 sudden unexpected death in epilepsy (40 definite, 9 definite plus, 6 probable, 6 possible) and 31 people with epilepsy controls who died from other causes. The mean age at death was 34.4 years and 65.2% (60/92) were male. The average age of death was younger for sudden unexpected death in epilepsy cases than for epilepsy controls (30.0 versus 39.6 years; P = 0.006), and there was no difference in sex distribution respectively (67.3% male versus 64.5%, P = 0.8). Among sudden unexpected death in epilepsy cases, earlier age of epilepsy onset positively correlated with a younger age at death (P = 0.0005) and negatively correlated with epilepsy duration (P = 0.001). Neuropathological findings were identified in 83.7% of the cases in our cohort. The most common findings were dentate gyrus dysgenesis (sudden unexpected death in epilepsy 50.9%, epilepsy controls 54.8%) and focal cortical dysplasia (FCD) (sudden unexpected death in epilepsy 41.8%, epilepsy controls 29.0%). The neuropathological findings in sudden unexpected death in epilepsy paralleled those in epilepsy controls, including the frequency of total neuropathological findings as well as the specific findings in the dentate gyrus, findings pertaining to neurodevelopment (e.g. FCD, heterotopias) and findings in the brainstem (e.g. medullary arcuate or olivary dysgenesis). Thus, like prior studies, we found no neuropathological findings that were more common in sudden unexpected death in epilepsy cases. Future neuropathological studies evaluating larger sudden unexpected death in epilepsy and control cohorts would benefit from inclusion of different epilepsy syndromes with detailed phenotypic information, consensus among pathologists particularly for more subjective findings where observations can be inconsistent, and molecular approaches to identify markers of sudden unexpected death in epilepsy risk or pathogenesis.

Project description:BackgroundThe coronavirus disease 2019 (COVID-19) pandemic has impacted many aspects of ST-segment elevation myocardial infarction (STEMI) care, including timely access to primary percutaneous coronary intervention (PPCI).ObjectivesThe goal of the NACMI (North American COVID-19 and STEMI) registry is to describe demographic characteristics, management strategies, and outcomes of COVID-19 patients with STEMI.MethodsA prospective, ongoing observational registry was created under the guidance of 3 cardiology societies. STEMI patients with confirmed COVID+ (group 1) or suspected (person under investigation [PUI]) (group 2) COVID-19 infection were included. A group of age- and sex-matched STEMI patients (matched to COVID+ patients in a 2:1 ratio) treated in the pre-COVID era (2015 to 2019) serves as the control group for comparison of treatment strategies and outcomes (group 3). The primary outcome was a composite of in-hospital death, stroke, recurrent myocardial infarction, or repeat unplanned revascularization.ResultsAs of December 6, 2020, 1,185 patients were included in the NACMI registry (230 COVID+ patients, 495 PUIs, and 460 control patients). COVID+ patients were more likely to have minority ethnicity (Hispanic 23%, Black 24%) and had a higher prevalence of diabetes mellitus (46%) (all p < 0.001 relative to PUIs). COVID+ patients were more likely to present with cardiogenic shock (18%) but were less likely to receive invasive angiography (78%) (all p < 0.001 relative to control patients). Among COVID+ patients who received angiography, 71% received PPCI and 20% received medical therapy (both p < 0.001 relative to control patients). The primary outcome occurred in 36% of COVID+ patients, 13% of PUIs, and 5% of control patients (p < 0.001 relative to control patients).ConclusionsCOVID+ patients with STEMI represent a high-risk group of patients with unique demographic and clinical characteristics. PPCI is feasible and remains the predominant reperfusion strategy, supporting current recommendations.

Project description:Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW), but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina's HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release): (1) 3 specifically selected populations (YRI, CEU, and ASW); (2) 8 populations of diverse African (AFR) or European (AFR) descent; and (3) all 14 available populations (ALL). Based on chromosome 22, we calculated three performance metrics: (1) concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement); (2) imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs); and (3) average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs). Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%-93%), but IMPUTE2 had the highest IQS (81%-83%) and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL). Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2%) that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%), use of the ALL panel for African American studies requires careful interpretation of the population specificity and imputation quality of low frequency SNPs.

Dataset Information

Prediction Model Performance With Different Imputation Strategies: A Simulation Study Using a North American ICU Registry.

Objectives

Design

Setting

Patients

Interventions

Measurements and main results

Conclusions

Publications

Prediction Model Performance With Different Imputation Strategies: A Simulation Study Using a North American ICU Registry.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets