Dataset Information

Deep Confident Steps to New Pockets: Strategies for Docking Generalization.

ABSTRACT: Accurate blind docking has the potential to lead to new biological breakthroughs, but for this promise to be realized, docking methods must generalize well across the proteome. Existing benchmarks, however, fail to rigorously assess generalizability. Therefore, we develop DockGen, a new benchmark based on the ligand-binding domains of proteins, and we show that existing machine learning-based docking models have very weak generalization abilities. We carefully analyze the scaling laws of ML-based docking and show that, by scaling data and model size, as well as integrating synthetic data strategies, we are able to significantly increase the generalization capacity and set new state-of-the-art performance across benchmarks. Further, we propose Confidence Bootstrapping, a new training paradigm that solely relies on the interaction between diffusion and confidence models and exploits the multi-resolution generation process of diffusion models. We demonstrate that Confidence Bootstrapping significantly improves the ability of ML-based docking methods to dock to unseen protein classes, edging closer to accurate and generalizable blind docking methods.

SUBMITTER: Corso G

PROVIDER: S-EPMC10925391 | biostudies-literature | 2024 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Deep Confident Steps to New Pockets: Strategies for Docking Generalization.

Corso Gabriele G Deng Arthur A Fry Benjamin B Polizzi Nicholas N Barzilay Regina R Jaakkola Tommi T

ArXiv 20240228

Accurate blind docking has the potential to lead to new biological breakthroughs, but for this promise to be realized, docking methods must generalize well across the proteome. Existing benchmarks, however, fail to rigorously assess generalizability. Therefore, we develop DockGen, a new benchmark based on the ligand-binding domains of proteins, and we show that existing machine learning-based docking models have very weak generalization abilities. We carefully analyze the scaling laws of ML-base ...[more]

PMID: 38463508

Dataset Information

Deep Confident Steps to New Pockets: Strategies for Docking Generalization.

Publications

Deep Confident Steps to New Pockets: Strategies for Docking Generalization.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Deep Generalization of Structured Low-Rank Algorithms (Deep-SLR).
| S-EPMC7731895 | biostudies-literature

DeepHistoClass: A Novel Strategy for Confident Classification of Immunohistochemistry Images Using Deep Learning.
| S-EPMC8476775 | biostudies-literature

KRAS G12C fragment screening renders new binding pockets.
| S-EPMC8923024 | biostudies-literature

Multiple fragment docking and linking in primary and secondary pockets of dopamine receptors.
| S-EPMC4160746 | biostudies-literature

Toward prediction of functional protein pockets using blind docking and pocket search algorithms.
| S-EPMC3125872 | biostudies-literature

Deep Generative Medical Image Harmonization for Improving Cross-Site Generalization in Deep Learning Predictors.
| S-EPMC8844038 | biostudies-literature

Oral microbiome of deep and shallow dental pockets in chronic periodontitis.
| S-EPMC3675156 | biostudies-literature

Microbiome composition and metabolic pathways in shallow and deep periodontal pockets.
| S-EPMC12000285 | biostudies-literature

Evaluating GPCR modeling and docking strategies in the era of deep learning-based protein structure prediction.
| S-EPMC9747351 | biostudies-literature

Deep Panning: steps towards probing the IgOme.
| S-EPMC3409857 | biostudies-literature