Unknown

Dataset Information

0

Structured information extraction from scientific text with large language models.


ABSTRACT: Extracting structured knowledge from scientific text remains a challenging task for machine learning models. Here, we present a simple approach to joint named entity recognition and relation extraction and demonstrate how pretrained large language models (GPT-3, Llama-2) can be fine-tuned to extract useful records of complex scientific knowledge. We test three representative tasks in materials chemistry: linking dopants and host materials, cataloging metal-organic frameworks, and general composition/phase/morphology/application information extraction. Records are extracted from single sentences or entire paragraphs, and the output can be returned as simple English sentences or a more structured format such as a list of JSON objects. This approach represents a simple, accessible, and highly flexible route to obtaining large databases of structured specialized scientific knowledge extracted from research papers.

SUBMITTER: Dagdelen J 

PROVIDER: S-EPMC10869356 | biostudies-literature | 2024 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Structured information extraction from scientific text with large language models.

Dagdelen John J   Dunn Alexander A   Lee Sanghoon S   Walker Nicholas N   Rosen Andrew S AS   Ceder Gerbrand G   Persson Kristin A KA   Jain Anubhav A  

Nature communications 20240215 1


Extracting structured knowledge from scientific text remains a challenging task for machine learning models. Here, we present a simple approach to joint named entity recognition and relation extraction and demonstrate how pretrained large language models (GPT-3, Llama-2) can be fine-tuned to extract useful records of complex scientific knowledge. We test three representative tasks in materials chemistry: linking dopants and host materials, cataloging metal-organic frameworks, and general composi  ...[more]

Similar Datasets

| S-EPMC11773977 | biostudies-literature
| S-EPMC11415382 | biostudies-literature
| S-EPMC11546091 | biostudies-literature
| S-EPMC11398444 | biostudies-literature
| S-EPMC10729196 | biostudies-literature
| S-EPMC11015372 | biostudies-literature
| S-EPMC10791738 | biostudies-literature
| S-EPMC11844613 | biostudies-literature
| S-EPMC11751965 | biostudies-literature
| S-EPMC3441580 | biostudies-literature