Unknown

Dataset Information

0

Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe.


ABSTRACT: Stability of proteins at high temperature has been a topic of interest for many years, as this attribute is favourable for applications ranging from therapeutics to industrial chemical manufacturing. Our current understanding and methods for designing high-temperature stability into target proteins are inadequate. To drive innovation in this space, we have curated a large dataset, learn2thermDB, of protein-temperature examples, totalling 24 million instances, and paired proteins across temperatures based on homology, yielding 69 million protein pairs - orders of magnitude larger than the current largest. This important step of pairing allows for study of high-temperature stability in a sequence-dependent manner in the big data era. The data pipeline is parameterized and open, allowing it to be tuned by downstream users. We further show that the data contains signal for deep learning. This data offers a new doorway towards thermal stability design models.

SUBMITTER: Komp E 

PROVIDER: S-EPMC10560248 | biostudies-literature | 2023 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe.

Komp Evan E   Alanzi Humood N HN   Francis Ryan R   Vuong Chau C   Roberts Logan L   Mosallanejad Amin A   Beck David A C DAC  

Scientific data 20231007 1


Stability of proteins at high temperature has been a topic of interest for many years, as this attribute is favourable for applications ranging from therapeutics to industrial chemical manufacturing. Our current understanding and methods for designing high-temperature stability into target proteins are inadequate. To drive innovation in this space, we have curated a large dataset, learn2thermDB, of protein-temperature examples, totalling 24 million instances, and paired proteins across temperatu  ...[more]

Similar Datasets

| S-EPMC6299218 | biostudies-literature
| S-EPMC4093959 | biostudies-literature
| S-EPMC10584675 | biostudies-literature
| S-EPMC3079058 | biostudies-literature
| S-EPMC4102405 | biostudies-literature
| S-EPMC2242362 | biostudies-literature
| S-EPMC10515827 | biostudies-literature
| S-EPMC10776241 | biostudies-literature
| S-EPMC1615881 | biostudies-literature
| PRJNA861369 | ENA