Dataset Information


The Impact of Medical Big Data Anonymization on Early Acute Kidney Injury Risk Prediction.

ABSTRACT: Artificial intelligence enabled medical big data analysis has the potential to revolutionize medical practice from diagnosis and prediction of complex diseases to making recommendations and resource allocation decisions in an evidence-based manner. However, big data comes with big disclosure risks. To preserve privacy, excessive data anonymization is often necessary, leading to significant loss of data utility. In this paper, we develop a systematic data scrubbing procedure for large datasets when key variables are uncertain for re-identification risk assessment and assess the trade-off between anonymization of electronic health record data for sharing in support of open science and performance of machine learning models for early acute kidney injury risk prediction using the data. Results demonstrate that our proposed data scrubbing procedure can maintain good feature diversity and moderate data utility but raises concerns regarding its impact on knowledge discovery capability.


PROVIDER: S-EPMC7233037 | BioStudies | 2020-01-01

REPOSITORIES: biostudies

Similar Datasets

2018-01-01 | S-EPMC6284146 | BioStudies
2020-01-01 | S-EPMC7729909 | BioStudies
2014-01-01 | S-EPMC4260994 | BioStudies
2019-01-01 | S-EPMC6658290 | BioStudies
2017-01-01 | S-EPMC7651952 | BioStudies
2016-01-01 | S-EPMC5130981 | BioStudies
2019-01-01 | S-EPMC6319275 | BioStudies
1000-01-01 | S-EPMC4947904 | BioStudies
2020-01-01 | S-EPMC7576981 | BioStudies
1000-01-01 | S-EPMC2778669 | BioStudies