Dataset Information

Construction of a confounder-free clinical MRI dataset in the Mass General Brigham system for classification of Alzheimer's disease.

ABSTRACT: Deep learning has the potential to standardize and automate diagnostics for complex medical imaging data, but real-world clinical images are plagued by a high degree of heterogeneity and confounding factors that may introduce imbalances and biases to such processes. To address this, we developed and applied a data matching algorithm to 467,464 clinical brain magnetic resonance imaging (MRI) data from the Mass General Brigham (MGB) healthcare system for Alzheimer's disease (AD) classification. We identified 18 technical and demographic confounding factors that can be readily distinguished by MRI or have significant correlations with AD status and isolated a training set free from these confounds. We then applied an ensemble of 3D ResNet-50 deep learning models to classify brain MRIs between groups of AD, mild cognitive impairment (MCI), and healthy controls. From a confounder-free matched dataset of 287,367 MRI files, we achieved an area under the receiver operating characteristic (AUROC) of 0.82 in distinguishing healthy controls from patients with AD or MCI. We also showed that confounding factors in heterogeneous clinical data could lead to artificial gains in model performance for disease classification, which our data matching approach could correct. This approach could accelerate using deep learning models for clinical diagnosis and find broad applications in medical image analysis.

SUBMITTER: Leming M

PROVIDER: S-EPMC9295028 | biostudies-literature | 2022 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Construction of a confounder-free clinical MRI dataset in the Mass General Brigham system for classification of Alzheimer's disease.

Leming Matthew M Das Sudeshna S Im Hyungsoon H

Artificial intelligence in medicine 20220427

Deep learning has the potential to standardize and automate diagnostics for complex medical imaging data, but real-world clinical images are plagued by a high degree of heterogeneity and confounding factors that may introduce imbalances and biases to such processes. To address this, we developed and applied a data matching algorithm to 467,464 clinical brain magnetic resonance imaging (MRI) data from the Mass General Brigham (MGB) healthcare system for Alzheimer's disease (AD) classification. We ...[more]

PMID: 35659387

Similar Datasets

Project description:ImportanceSocioeconomic disadvantage is associated with poor health outcomes. However, whether socioeconomic factors are associated with post-myocardial infarction (MI) outcomes in younger patient populations is unknown.ObjectiveTo evaluate the association of neighborhood-level socioeconomic disadvantage with long-term outcomes among patients who experienced an MI at a young age.Design, setting, and participantsThis cohort study analyzed patients in the Mass General Brigham YOUNG-MI Registry (at Brigham and Women's Hospital and Massachusetts General Hospital in Boston, Massachusetts) who experienced an MI at or before 50 years of age between January 1, 2000, and April 30, 2016. Each patient's home address was mapped to the Area Deprivation Index (ADI) to capture higher rates of socioeconomic disadvantage. The median follow-up duration was 11.3 years. The dates of analysis were May 1, 2020, to June 30, 2020.ExposuresPatients were assigned an ADI ranking according to their home address and then stratified into 3 groups (least disadvantaged group, middle group, and most disadvantaged group).Main outcomes and measuresThe outcomes of interest were all-cause and cardiovascular mortality. Cause of death was adjudicated from national registries and electronic medical records. Cox proportional hazards regression modeling was used to evaluate the association of ADI with all-cause and cardiovascular mortality.ResultsThe cohort consisted of 2097 patients, of whom 2002 (95.5%) with an ADI ranking were included (median [interquartile range] age, 45 [42-48] years; 1607 male individuals [80.3%]). Patients in the most disadvantaged neighborhoods were more likely to be Black or Hispanic, have public insurance or no insurance, and have higher rates of traditional cardiovascular risk factors such as hypertension and diabetes. Among the 1964 patients who survived to hospital discharge, 74 (13.6%) in the most disadvantaged group compared with 88 (12.6%) in the middle group and 41 (5.7%) in the least disadvantaged group died. Even after adjusting for a comprehensive set of clinical covariates, higher neighborhood disadvantage was associated with a 32% higher all-cause mortality (hazard ratio, 1.32; 95% CI, 1.10-1.60; P = .004) and a 57% higher cardiovascular mortality (hazard ratio, 1.57; 95% CI, 1.17-2.10; P = .003).Conclusions and relevanceThis study found that, among patients who experienced an MI at or before age 50 years, socioeconomic disadvantage was associated with higher all-cause and cardiovascular mortality even after adjusting for clinical comorbidities. These findings suggest that neighborhood and socioeconomic factors have an important role in long-term post-MI survival.

Project description:Modern machine learning algorithms are increasingly being used in neuroimaging studies, such as the prediction of Alzheimer's disease (AD) from structural MRI. However, finding a good representation for multivariate brain MRI features in which their essential structure is revealed and easily extractable has been difficult. We report a successful application of a machine learning framework that significantly improved the use of brain MRI for predictions. Specifically, we used the unsupervised learning algorithm of local linear embedding (LLE) to transform multivariate MRI data of regional brain volume and cortical thickness to a locally linear space with fewer dimensions, while also utilizing the global nonlinear data structure. The embedded brain features were then used to train a classifier for predicting future conversion to AD based on a baseline MRI. We tested the approach on 413 individuals from the Alzheimer's Disease Neuroimaging Initiative (ADNI) who had baseline MRI scans and complete clinical follow-ups over 3 years with the following diagnoses: cognitive normal (CN; n=137), stable mild cognitive impairment (s-MCI; n=93), MCI converters to AD (c-MCI, n=97), and AD (n=86). We found that classifications using embedded MRI features generally outperformed (p<0.05) classifications using the original features directly. Moreover, the improvement from LLE was not limited to a particular classifier but worked equally well for regularized logistic regressions, support vector machines, and linear discriminant analysis. Most strikingly, using LLE significantly improved (p=0.007) predictions of MCI subjects who converted to AD and those who remained stable (accuracy/sensitivity/specificity: =0.68/0.80/0.56). In contrast, predictions using the original features performed not better than by chance (accuracy/sensitivity/specificity: =0.56/0.65/0.46). In conclusion, LLE is a very effective tool for classification studies of AD using multivariate MRI data. The improvement in predicting conversion to AD in MCI could have important implications for health management and for powering therapeutic trials by targeting non-demented subjects who later convert to AD.

Dataset Information

Construction of a confounder-free clinical MRI dataset in the Mass General Brigham system for classification of Alzheimer's disease.

Publications

Construction of a confounder-free clinical MRI dataset in the Mass General Brigham system for classification of Alzheimer's disease.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets