Large-scale proteogenomics characterization of the Mycobacterium tuberculosis hidden proteome
Ontology highlight
ABSTRACT: Traditional genome annotation methods exclude Open Reading Frames shorter than 300 codons (smORFs), which leaves a substantial portion of the proteome overlooked. Proteogenomics is a multi-omics approach that merges genomics, transcriptomics and proteomics to identify proteoforms and unannotated proteins from Mass Spectrometry data. Here, we employed our recently developed proteogenomics pipeline to aid genome annotation and identify hundreds of novel microproteins encoded by smORFs in the genome of Mycobacterium tuberculosis (Mtb). To avoid limitations regarding sensitivity, we used 680 Mass Spectrometry experiments in a large-scale approach, which let us classify the findings by different degrees of confidence using our machine learning model. After integrating the results with RNA-Seq datasets, we explore the biological relevance of the novel sequences and show they are differentially expressed upon starvation and antibiotic treatment, and are co-expressed with many annotated genes that are vital for bacterial virulence. Moreover, some smORFs are located inside essential genomic segments and could be attractive targets for the development of new drugs. Altogether, our results should improve the current annotation of the proteome of Mtb and guide the following studies focusing on studying these microproteins thoroughly.
INSTRUMENT(S): LTQ Orbitrap
ORGANISM(S): Mycobacterium Tuberculosis H37rv
SUBMITTER:
Eduardo Vieira de Souza
LAB HEAD: Cristiano Valim Bizarro
PROVIDER: PXD042958 | Pride | 2025-05-06
REPOSITORIES: Pride
ACCESS DATA