Unknown

Dataset Information

0

Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes.


ABSTRACT:

Background

Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward.

Results

In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists.

Conclusions

This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction.

SUBMITTER: Pourreza Shahri M 

PROVIDER: S-EPMC8520253 | biostudies-literature | 2021 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes.

Pourreza Shahri Morteza M   Kahanda Indika I  

BMC bioinformatics 20211016 1


<h4>Background</h4>Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for train  ...[more]

Similar Datasets

| S-EPMC11522217 | biostudies-literature
2019-11-13 | GSE140262 | GEO
| S-EPMC6857507 | biostudies-literature
| S-EPMC8904133 | biostudies-literature
| S-EPMC8570780 | biostudies-literature
| S-EPMC6540576 | biostudies-literature
| S-EPMC11394203 | biostudies-literature
| PRJNA589061 | ENA
| S-EPMC6550282 | biostudies-literature
| S-EPMC5441628 | biostudies-literature