Ontology highlight
ABSTRACT: Objective
Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers.Materials and methods
The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics.Results
Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access.Conclusions
The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.
SUBMITTER: Haendel MA
PROVIDER: S-EPMC7454687 | biostudies-literature | 2021 Mar
REPOSITORIES: biostudies-literature
Haendel Melissa A MA Chute Christopher G CG Bennett Tellen D TD Eichmann David A DA Guinney Justin J Kibbe Warren A WA Payne Philip R O PRO Pfaff Emily R ER Robinson Peter N PN Saltz Joel H JH Spratt Heidi H Suver Christine C Wilbanks John J Wilcox Adam B AB Williams Andrew E AE Wu Chunlei C Blacketer Clair C Bradford Robert L RL Cimino James J JJ Clark Marshall M Colmenares Evan W EW Francis Patricia A PA Gabriel Davera D Graves Alexis A Hemadri Raju R Hong Stephanie S SS Hripscak George G Jiao Dazhi D Klann Jeffrey G JG Kostka Kristin K Lee Adam M AM Lehmann Harold P HP Lingrey Lora L Miller Robert T RT Morris Michele M Murphy Shawn N SN Natarajan Karthik K Palchuk Matvey B MB Sheikh Usman U Solbrig Harold H Visweswaran Shyam S Walden Anita A Walters Kellie M KM Weber Griffin M GM Zhang Xiaohan Tanner XT Zhu Richard L RL Amor Benjamin B Girvin Andrew T AT Manna Amin A Qureshi Nabeel N Kurilla Michael G MG Michael Sam G SG Portilla Lili M LM Rutter Joni L JL Austin Christopher P CP Gersing Ken R KR
Journal of the American Medical Informatics Association : JAMIA 20210301 3
<h4>Objective</h4>Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing pati ...[more]