Unknown

Dataset Information

0

Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.


ABSTRACT:

Objective

Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this "residual PII problem." HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII.

Materials and methods

Using 2000 representative clinical documents from 2 healthcare settings (4000 total), we used a novel method to generate 2 de-identified 100-document corpora (200 documents total) in which PII tagged by a typical automated machine-learned tagger was replaced by HIPS-resynthesized content. Four readers conducted aggressive reidentification attacks to isolate leaked PII: 2 readers from within the originating institution and 2 external readers.

Results

Overall, mean recall of leaked PII was 26.8% and mean precision was 37.2%. Mean recall was 9% (mean precision = 37%) for patient ages, 32% (mean precision = 26%) for dates, 25% (mean precision = 37%) for doctor names, 45% (mean precision = 55%) for organization names, and 23% (mean precision = 57%) for patient names. Recall was 32% (precision = 40%) for internal and 22% (precision =33%) for external readers.

Discussion and conclusions

Approximately 70% of leaked PII "hiding" in a corpus de-identified with HIPS resynthesis is resilient to detection by human readers in a realistic, aggressive reidentification attack scenario-more than double the rate reported in previous studies but less than the rate reported for an attack assisted by machine learning methods.

SUBMITTER: Carrell DS 

PROVIDER: S-EPMC7647331 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Carrell David S DS   Malin Bradley A BA   Cronkite David J DJ   Aberdeen John S JS   Clark Cheryl C   Li Muqun Rachel MR   Bastakoty Dikshya D   Nyemba Steve S   Hirschman Lynette L  

Journal of the American Medical Informatics Association : JAMIA 20200701 9


<h4>Objective</h4>Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this "residual PII problem." HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII.<h4>Materials and methods</h4>Using 2000 re  ...[more]

Similar Datasets

| S-EPMC7382342 | biostudies-literature
| S-EPMC2633103 | biostudies-literature
| S-EPMC3638183 | biostudies-literature
| S-EPMC8141065 | biostudies-literature
| S-EPMC7081726 | biostudies-literature
| S-EPMC7328255 | biostudies-literature
| S-EPMC6857511 | biostudies-literature
| S-EPMC8009683 | biostudies-literature
| S-EPMC8327880 | biostudies-literature
| S-EPMC5541158 | biostudies-literature