Unknown

Dataset Information

0

VHULK, a New Tool for Bacteriophage Host Prediction Based on Annotated Genomic Features and Neural Networks.


ABSTRACT:

Background

The experimental determination of a bacteriophage host is a laborious procedure. Thus, there is a pressing need for reliable computational predictions of bacteriophage hosts.

Materials and methods

We developed the program vHULK for phage host prediction based on 9504 phage genome features, which consider alignment significance scores between predicted proteins and a curated database of viral protein families. The features were fed to a neural network, and two models were trained to predict 77 host genera and 118 host species.

Results

In controlled random test sets with 90% redundancy reduction in terms of protein similarity, vHULK obtained on average 83% precision and 79% recall at the genus level, and 71% precision and 67% recall at the species level. The performance of vHULK was compared against three other tools on a test data set with 2153 phage genomes. On this data set, vHULK achieved better performance at both the genus and the species levels than the other tools.

Conclusions

Our results suggest that vHULK represents an advance on the state of art in phage host prediction.

SUBMITTER: Amgarten D 

PROVIDER: S-EPMC9917316 | biostudies-literature | 2022 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

vHULK, a New Tool for Bacteriophage Host Prediction Based on Annotated Genomic Features and Neural Networks.

Amgarten Deyvid D   Iha Bruno Koshin Vázquez BKV   Piroupo Carlos Morais CM   da Silva Aline Maria AM   Setubal João Carlos JC  

PHAGE (New Rochelle, N.Y.) 20221219 4


<h4>Background</h4>The experimental determination of a bacteriophage host is a laborious procedure. Thus, there is a pressing need for reliable computational predictions of bacteriophage hosts.<h4>Materials and methods</h4>We developed the program vHULK for phage host prediction based on 9504 phage genome features, which consider alignment significance scores between predicted proteins and a curated database of viral protein families. The features were fed to a neural network, and two models wer  ...[more]

Similar Datasets

| S-EPMC11215832 | biostudies-literature
| S-EPMC7689358 | biostudies-literature
| S-EPMC9100816 | biostudies-literature
| S-EPMC11540326 | biostudies-literature
| S-EPMC10691023 | biostudies-literature
| S-EPMC9547190 | biostudies-literature
| S-EPMC9729992 | biostudies-literature
| S-EPMC10977432 | biostudies-literature
| S-EPMC9016862 | biostudies-literature
| S-EPMC8407593 | biostudies-literature