Unknown

Dataset Information

0

PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks.


ABSTRACT: Advances in genome sequencing have accelerated the growth of sequenced genomes but at a cost in the quality of genome annotation. At the same time, computational analysis is widely used for protein annotation, but a dearth of experimental verification has contributed to inaccurate annotation as well as to annotation error propagation. Thus, a tool to help life scientists with accurate protein annotation would be useful. In this work we describe a website we have developed, the Protein Annotation Surveillance Site (PASS), which provides such a tool. This website consists of three major components: a database of homologous clusters of more than eight million protein sequences deduced from the representative genomes of bacteria, archaea, eukarya, and viruses, together with sequence information; a machine-learning software tool which periodically queries the UniprotKB database to determine whether protein function has been experimentally verified; and a query-able webpage where the FASTA headers of sequences from the cluster best matching an input sequence are returned. The user can choose from these sequences to create a sequence similarity network to assist in annotation or else use their expert knowledge to choose an annotation from the cluster sequences. Illustrations demonstrating use of this website are presented.

SUBMITTER: Tao J 

PROVIDER: S-EPMC9581018 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

altmetric image

Publications

PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks.

Tao Jin J   Brayton Kelly A KA   Broschat Shira L SL  

Frontiers in bioinformatics 20210929


Advances in genome sequencing have accelerated the growth of sequenced genomes but at a cost in the quality of genome annotation. At the same time, computational analysis is widely used for protein annotation, but a dearth of experimental verification has contributed to inaccurate annotation as well as to annotation error propagation. Thus, a tool to help life scientists with accurate protein annotation would be useful. In this work we describe a website we have developed, the Protein Annotation  ...[more]

Similar Datasets

| S-EPMC3949165 | biostudies-literature
| S-EPMC1940026 | biostudies-literature
| S-EPMC4382900 | biostudies-literature
| S-EPMC2936402 | biostudies-literature
| S-EPMC11225810 | biostudies-literature
| S-EPMC4120521 | biostudies-literature
| S-EPMC3176917 | biostudies-literature
| S-EPMC3584929 | biostudies-literature
| S-EPMC11231046 | biostudies-literature
| S-EPMC1539024 | biostudies-literature