Unknown

Dataset Information

0

A proteogenomic update to Yersinia: enhancing genome annotation.


ABSTRACT: Modern biomedical research depends on a complete and accurate proteome. With the widespread adoption of new sequencing technologies, genome sequences are generated at a near exponential rate, diminishing the time and effort that can be invested in genome annotation. The resulting gene set contains numerous errors in even the most basic form of annotation: the primary structure of the proteins.The application of experimental proteomics data to genome annotation, called proteogenomics, can quickly and efficiently discover misannotations, yielding a more accurate and complete genome annotation. We present a comprehensive proteogenomic analysis of the plague bacterium, Yersinia pestis KIM. We discover non-annotated genes, correct protein boundaries, remove spuriously annotated ORFs, and make major advances towards accurate identification of signal peptides. Finally, we apply our data to 21 other Yersinia genomes, correcting and enhancing their annotations.In total, 141 gene models were altered and have been updated in RefSeq and Genbank, which can be accessed seamlessly through any NCBI tool (e.g. blast) or downloaded directly. Along with the improved gene models we discover new, more accurate means of identifying signal peptides in proteomics data.

SUBMITTER: Payne SH 

PROVIDER: S-EPMC3091656 | biostudies-literature | 2010 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

A proteogenomic update to Yersinia: enhancing genome annotation.

Payne Samuel H SH   Huang Shih-Ting ST   Pieper Rembert R  

BMC genomics 20100805


<h4>Background</h4>Modern biomedical research depends on a complete and accurate proteome. With the widespread adoption of new sequencing technologies, genome sequences are generated at a near exponential rate, diminishing the time and effort that can be invested in genome annotation. The resulting gene set contains numerous errors in even the most basic form of annotation: the primary structure of the proteins.<h4>Results</h4>The application of experimental proteomics data to genome annotation,  ...[more]

Similar Datasets

2012-07-26 | PRD000484 | Pride
| S-EPMC4762527 | biostudies-literature
| S-EPMC4924849 | biostudies-other
| S-EPMC5753331 | biostudies-literature
2019-03-20 | PXD008508 | Pride
| S-EPMC2826280 | biostudies-literature
2020-01-13 | GSE143486 | GEO
| S-EPMC8563848 | biostudies-literature