Unknown

Dataset Information

0

Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention.


ABSTRACT: The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases.

SUBMITTER: Bhattacharya N 

PROVIDER: S-EPMC8752338 | biostudies-literature | 2022

REPOSITORIES: biostudies-literature

altmetric image

Publications

Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention.

Bhattacharya Nicholas N   Thomas Neil N   Rao Roshan R   Dauparas Justas J   Koo Peter K PK   Baker David D   Song Yun S YS   Ovchinnikov Sergey S  

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 20220101


The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We  ...[more]

Similar Datasets

| S-EPMC9304350 | biostudies-literature
| S-EPMC5869684 | biostudies-literature
| S-EPMC7728182 | biostudies-literature
| S-EPMC9575930 | biostudies-literature
| S-EPMC10765783 | biostudies-literature
| S-EPMC7228904 | biostudies-literature
| S-EPMC11586890 | biostudies-literature
| S-EPMC5622342 | biostudies-literature
| S-EPMC7514434 | biostudies-literature
| S-EPMC6448275 | biostudies-literature