Unknown

Dataset Information

0

3-D substructure search by transitive closure in AlphaFold database.


ABSTRACT: Identifying structural relationships between proteins is crucial for understanding their functions and evolutionary histories. We present ISS_ProtSci, a Python package designed for structural similarity searches within the AlphaFold Database v2 (AFDB2). ISS_ProtSci incorporates DaliLite to identify geometrically similar structures and uses a transitive closure algorithm to iteratively explore neighboring shells of proteins. The precomputed all-against-all comparisons generated by Foldseek, chosen for its speed, are validated by DaliLite for precision. Search results are annotated with metadata from UniProtKB and Pfam protein family classifications, using hmmsearch to identify protein domains. Outputs, including Dali pairwise alignment data, are provided in TSV format for easy filtering and analysis. Our method offers a significant improvement in recall over existing tools like Foldseek, especially in detecting more distantly related proteins. This is particularly valuable in structurally diverse protein families where traditional sequence-based or fast structural methods struggle. ISS_ProtSci delivers practical runtimes and flexibility, allowing users to input a PDB file, define the minimum size of the common core, and evaluate results using Pfam clans. In evaluating our method across 12 test cases based on Pfam clans, we achieved over 99% recall of relevant proteins, even in challenging cases where Foldseek's recall dropped below 50%. ISS_ProtSci not only identifies closely related proteins but also uncovers previously unrecognized structural relationships, contributing to more accurate protein family classifications. The software can be downloaded from http://ekhidna2.biocenter.helsinki.fi/ISS_ProtSci/.

SUBMITTER: Liu H 

PROVIDER: S-EPMC12095923 | biostudies-literature | 2025 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

3-D substructure search by transitive closure in AlphaFold database.

Liu Hao H   Laiho Aleksi A   Törönen Petri P   Holm Liisa L  

Protein science : a publication of the Protein Society 20250601 6


Identifying structural relationships between proteins is crucial for understanding their functions and evolutionary histories. We present ISS_ProtSci, a Python package designed for structural similarity searches within the AlphaFold Database v2 (AFDB2). ISS_ProtSci incorporates DaliLite to identify geometrically similar structures and uses a transitive closure algorithm to iteratively explore neighboring shells of proteins. The precomputed all-against-all comparisons generated by Foldseek, chose  ...[more]

Similar Datasets

| S-EPMC3138080 | biostudies-literature
| S-EPMC12343037 | biostudies-literature
| S-EPMC3573025 | biostudies-literature
| S-EPMC2739202 | biostudies-literature
| S-EPMC4684231 | biostudies-literature
| S-EPMC8983703 | biostudies-literature
| S-EPMC8783046 | biostudies-literature
2015-07-31 | GSE59956 | GEO
| S-EPMC9047035 | biostudies-literature