Unknown

Dataset Information

0

M-Ionic: prediction of metal-ion-binding sites from sequence using residue embeddings.


ABSTRACT:

Motivation

Understanding metal-protein interaction can provide structural and functional insights into cellular processes. As the number of protein sequences increases, developing fast yet precise computational approaches to predict and annotate metal-binding sites becomes imperative. Quick and resource-efficient pre-trained protein language model (pLM) embeddings have successfully predicted binding sites from protein sequences despite not using structural or evolutionary features (multiple sequence alignments). Using residue-level embeddings from the pLMs, we have developed a sequence-based method (M-Ionic) to identify metal-binding proteins and predict residues involved in metal binding.

Results

On independent validation of recent proteins, M-Ionic reports an area under the curve (AUROC) of 0.83 (recall = 84.6%) in distinguishing metal binding from non-binding proteins compared to AUROC of 0.74 (recall = 61.8%) of the next best method. In addition to comparable performance to the state-of-the-art method for identifying metal-binding residues (Ca2+, Mg2+, Mn2+, Zn2+), M-Ionic provides binding probabilities for six additional ions (i.e. Cu2+, Po43-, So42-, Fe2+, Fe3+, Co2+). We show that the pLM embedding of a single residue contains sufficient information about its neighbours to predict its binding properties.

Availability and implementation

M-Ionic can be used on your protein of interest using a Google Colab Notebook (https://bit.ly/40FrRbK). The GitHub repository (https://github.com/TeamSundar/m-ionic) contains all code and data.

SUBMITTER: Shenoy A 

PROVIDER: S-EPMC10792727 | biostudies-literature | 2024 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

M-Ionic: prediction of metal-ion-binding sites from sequence using residue embeddings.

Shenoy Aditi A   Kalakoti Yogesh Y   Sundar Durai D   Elofsson Arne A  

Bioinformatics (Oxford, England) 20240101 1


<h4>Motivation</h4>Understanding metal-protein interaction can provide structural and functional insights into cellular processes. As the number of protein sequences increases, developing fast yet precise computational approaches to predict and annotate metal-binding sites becomes imperative. Quick and resource-efficient pre-trained protein language model (pLM) embeddings have successfully predicted binding sites from protein sequences despite not using structural or evolutionary features (multi  ...[more]

Similar Datasets

| S-EPMC3377655 | biostudies-literature
| S-EPMC8652027 | biostudies-literature
| S-EPMC9547369 | biostudies-literature
| S-EPMC9851297 | biostudies-literature
| S-EPMC1184590 | biostudies-literature
| S-EPMC9110757 | biostudies-literature
| S-EPMC11750443 | biostudies-literature
| S-EPMC1993824 | biostudies-literature
| S-EPMC10103162 | biostudies-literature
| S-EPMC1347421 | biostudies-literature