Unknown

Dataset Information

0

DIPS-Plus: The enhanced database of interacting protein structures for interface prediction.


ABSTRACT: In this work, we expand on a dataset recently introduced for protein interface prediction (PIP), the Database of Interacting Protein Structures (DIPS), to present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for machine learning of protein interfaces. While the original DIPS dataset contains only the Cartesian coordinates for atoms contained in the protein complex along with their types, DIPS-Plus contains multiple residue-level features including surface proximities, half-sphere amino acid compositions, and new profile hidden Markov model (HMM)-based sequence features for each amino acid, providing researchers a curated feature bank for training protein interface prediction methods. We demonstrate through rigorous benchmarks that training an existing state-of-the-art (SOTA) model for PIP on DIPS-Plus yields new SOTA results, surpassing the performance of some of the latest models trained on residue-level and atom-level encodings of protein complexes to date.

SUBMITTER: Morehead A 

PROVIDER: S-EPMC10400622 | biostudies-literature | 2023 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

DIPS-Plus: The enhanced database of interacting protein structures for interface prediction.

Morehead Alex A   Chen Chen C   Sedova Ada A   Cheng Jianlin J  

Scientific data 20230803 1


In this work, we expand on a dataset recently introduced for protein interface prediction (PIP), the Database of Interacting Protein Structures (DIPS), to present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for machine learning of protein interfaces. While the original DIPS dataset contains only the Cartesian coordinates for atoms contained in the protein complex along with their types, DIPS-Plus contains multiple residue-level features including surface proximities, half-sp  ...[more]

Similar Datasets

| S-EPMC1635331 | biostudies-literature
| S-EPMC5461544 | biostudies-literature
| S-EPMC2885377 | biostudies-literature
| S-EPMC5262449 | biostudies-literature
| S-EPMC1236910 | biostudies-literature
| S-EPMC3287255 | biostudies-literature
| S-EPMC9316365 | biostudies-literature
| S-EPMC1933225 | biostudies-literature
| S-EPMC2909730 | biostudies-literature
| S-EPMC4815308 | biostudies-literature