Unknown

Dataset Information

0

Fine-tuning protein language models boosts predictions across diverse tasks.


ABSTRACT: Prediction methods inputting embeddings from protein language models have reached or even surpassed state-of-the-art performance on many protein prediction tasks. In natural language processing fine-tuning large language models has become the de facto standard. In contrast, most protein language model-based protein predictions do not back-propagate to the language model. Here, we compare the fine-tuning of three state-of-the-art models (ESM2, ProtT5, Ankh) on eight different tasks. Two results stand out. Firstly, task-specific supervised fine-tuning almost always improves downstream predictions. Secondly, parameter-efficient fine-tuning can reach similar improvements consuming substantially fewer resources at up to 4.5-fold acceleration of training over fine-tuning full models. Our results suggest to always try fine-tuning, in particular for problems with small datasets, such as for fitness landscape predictions of a single protein. For ease of adaptability, we provide easy-to-use notebooks to fine-tune all models used during this work for per-protein (pooling) and per-residue prediction tasks.

SUBMITTER: Schmirler R 

PROVIDER: S-EPMC11358375 | biostudies-literature | 2024 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Fine-tuning protein language models boosts predictions across diverse tasks.

Schmirler Robert R   Heinzinger Michael M   Rost Burkhard B  

Nature communications 20240828 1


Prediction methods inputting embeddings from protein language models have reached or even surpassed state-of-the-art performance on many protein prediction tasks. In natural language processing fine-tuning large language models has become the de facto standard. In contrast, most protein language model-based protein predictions do not back-propagate to the language model. Here, we compare the fine-tuning of three state-of-the-art models (ESM2, ProtT5, Ankh) on eight different tasks. Two results s  ...[more]

Similar Datasets

| S-EPMC10659351 | biostudies-literature
| S-EPMC11529868 | biostudies-literature
| S-EPMC11339499 | biostudies-literature
| S-EPMC11875315 | biostudies-literature
| S-EPMC11535799 | biostudies-literature
| S-EPMC11223820 | biostudies-literature
| S-EPMC11293664 | biostudies-literature
| S-EPMC8530864 | biostudies-literature
| S-EPMC10400306 | biostudies-literature
| S-EPMC11659327 | biostudies-literature