Unknown

Dataset Information

0

Evaluation of large language models for discovery of gene set function.


ABSTRACT: Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI's GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions from its embedded biomedical knowledge. We created a GPT-4 pipeline to label gene sets with names that summarize their consensus functions, substantiated by analysis text and citations. Benchmarking against named gene sets in the Gene Ontology, GPT-4 generated very similar names in 50% of cases, while in most remaining cases it recovered the name of a more general concept. In gene sets discovered in 'omics data, GPT-4 names were more informative than gene set enrichment, with supporting statements and citations that largely verified in human review. The ability to rapidly synthesize common gene functions positions LLMs as valuable functional genomics assistants.

SUBMITTER: Hu M 

PROVIDER: S-EPMC10508824 | biostudies-literature | 2023 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Evaluation of large language models for discovery of gene set function.

Hu Mengzhou M   Alkhairy Sahar S   Lee Ingoo I   Pillich Rudolf T RT   Fong Dylan D   Smith Kevin K   Bachelder Robin R   Ideker Trey T   Pratt Dexter D  

ArXiv 20240401


Gene set analysis is a mainstay of functional genomics, but it relies on curated databases of gene functions that are incomplete. Here we evaluate five Large Language Models (LLMs) for their ability to discover the common biological functions represented by a gene set, substantiated by supporting rationale, citations and a confidence assessment. Benchmarking against canonical gene sets from the Gene Ontology, GPT-4 confidently recovered the curated name or a more general concept (73% of cases),  ...[more]

Similar Datasets

| S-EPMC10246080 | biostudies-literature
| S-EPMC11225096 | biostudies-literature
| S-EPMC11446478 | biostudies-literature
| S-EPMC10775343 | biostudies-literature
| S-EPMC10580627 | biostudies-literature
| S-EPMC11618017 | biostudies-literature
| S-EPMC11222590 | biostudies-literature
| S-EPMC11501434 | biostudies-literature
| S-EPMC11654935 | biostudies-literature
| S-EPMC11261925 | biostudies-literature