Unknown

Dataset Information

0

Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks.


ABSTRACT: Protein complexes can be computationally identified from protein-interaction networks with community detection methods, suggesting new multi-protein assemblies. Most community detection algorithms tend to be un- or semi-supervised and assume that communities are dense network subgraphs, which is not always true, as protein complexes can exhibit diverse network topologies. The few existing supervised machine learning methods are serial and can potentially be improved in terms of accuracy and scalability by using better-suited machine learning models and by using parallel algorithms, respectively. Here, we present Super.Complex, a distributed supervised machine learning pipeline for community detection in networks. Super.Complex learns a community fitness function from known communities using an AutoML method and applies this fitness function to detect new communities. A heuristic local search algorithm finds maximally scoring communities with epsilon-greedy and pseudo-metropolis criteria, and an embarrassingly parallel implementation can be run on a computer cluster for scaling to large networks. In order to evaluate Super.Complex, we propose three new measures for the still outstanding issue of comparing sets of learned and known communities. On a yeast protein-interaction network, Super.Complex outperforms 6 other supervised and 4 unsupervised methods. Application of Super.Complex to a human protein-interaction network with ~8k nodes and ~60k edges yields 1,028 protein complexes, with 234 complexes linked to SARS-CoV-2, with 111 uncharacterized proteins present in 103 learned complexes. Super.Complex is generalizable and can be used in different applications of community detection, with the ability to improve results by incorporating domain-specific features. Learned community characteristics can also be transferred from existing applications to detect communities in a new application with no known communities. Code and interactive visualizations of learned human protein complexes are freely available at: https://sites.google.com/view/supercomplex/super-complex-v3-0.

SUBMITTER: Palukuri MV 

PROVIDER: S-EPMC8240683 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC8455138 | biostudies-literature
| S-EPMC7895977 | biostudies-literature
2022-08-14 | GSE184943 | GEO
| S-EPMC9200117 | biostudies-literature
| S-EPMC6160042 | biostudies-literature
| S-EPMC2739204 | biostudies-literature
| S-EPMC3521185 | biostudies-literature