Genomics

Dataset Information

0

REPARATION: Ribosome Profiling Assisted (Re-) Annotation of Bacterial genomes


ABSTRACT: The delineation of genes in bacteria has remained an important challenge because prokaryotic genomes are often tightly packed frequently resulting in overlapping genes. We hereby present a de novo approach called REPARATION (RibosomeE Profiling Assisted (Re-)AnnotaTION) to delineate translated open reading frames (ORFs) in bacteria independent of (available) genome annotation. By deep sequencing of ribosome protected mRNA fragments (RPF) to map translating ribosomes across the entire genome, REPARATION takes advantage of the recently developed ribosome profiling (Ribo-seq) technique. REPARATION starts by traversing the entire genome to generate all possible ORFs and then collects their corresponding RPF signal information. Based on a growth curve model to estimate minimum ORF read density and Ribo-seq RPF coverage, thresholds indicative of translation is estimated. Finally, our algorithm applies a random forest model to build a classifier to classify putative protein coding ORFs. We evaluated the performance of REPARATION on 3 annotated bacterial species using in-house generated Ribo-seq data and matching N-terminal and shotgun proteomics data next to publically available Ribo-seq data. In all cases, about 80% of the ORFs predicted by REPARATION were previously annotated as protein coding. While 13-20% were variants of previously annotated ORFs and about 3-4% point to novel translated ORFs within intergenic or other regions previously annotated as non-coding. Without stringent length restrictions REPARATION was able to identify several small ORFs (sORFs). Multiple supportive evidence from matching MS data and sequence conservation analysis was obtained to validate predicted ORFs.

ORGANISM(S): Salmonella enterica subsp. enterica serovar Typhimurium str. SL1344

PROVIDER: GSE91066 | GEO | 2017/01/30

SECONDARY ACCESSION(S): PRJNA356772

REPOSITORIES: GEO

Similar Datasets

2017-08-03 | PXD005901 | Pride
2017-08-09 | PXD005844 | Pride
2019-07-05 | GSE124962 | GEO
2020-04-30 | GSE128320 | GEO
2023-02-01 | PXD031423 | Pride
2023-02-01 | PXD031460 | Pride
2020-07-18 | GSE125757 | GEO
2023-02-01 | PXD031691 | Pride
2019-12-31 | GSE116834 | GEO
2005-10-27 | E-SMDB-2508 | biostudies-arrayexpress