Proteomics

Dataset Information

0

Large-scale proteogenomics characterization of the Mycobacterium tuberculosis hidden proteome


ABSTRACT: Traditional genome annotation methods exclude Open Reading Frames shorter than 300 codons (smORFs), which leaves a substantial portion of the proteome overlooked. Proteogenomics is a multi-omics approach that merges genomics, transcriptomics and proteomics to identify proteoforms and unannotated proteins from Mass Spectrometry data. Here, we employed our recently developed proteogenomics pipeline to aid genome annotation and identify hundreds of novel microproteins encoded by smORFs in the genome of Mycobacterium tuberculosis (Mtb). To avoid limitations regarding sensitivity, we used 680 Mass Spectrometry experiments in a large-scale approach, which let us classify the findings by different degrees of confidence using our machine learning model. After integrating the results with RNA-Seq datasets, we explore the biological relevance of the novel sequences and show they are differentially expressed upon starvation and antibiotic treatment, and are co-expressed with many annotated genes that are vital for bacterial virulence. Moreover, some smORFs are located inside essential genomic segments and could be attractive targets for the development of new drugs. Altogether, our results should improve the current annotation of the proteome of Mtb and guide the following studies focusing on studying these microproteins thoroughly.

INSTRUMENT(S): LTQ Orbitrap

ORGANISM(S): Mycobacterium Tuberculosis H37rv

SUBMITTER: Eduardo Vieira de Souza  

LAB HEAD: Cristiano Valim Bizarro

PROVIDER: PXD042958 | Pride | 2025-05-06

REPOSITORIES: Pride

altmetric image

Publications

Large-scale proteogenomics characterization of microproteins in Mycobacterium tuberculosis.

de Souza Eduardo V EV   Dalberto Pedro F PF   Miranda Adriana C AC   Saghatelian Alan A   Pinto Antonio M AM   Basso Luiz A LA   Machado Pablo P   Bizarro Cristiano V CV  

Scientific reports 20241228 1


Tuberculosis remains a burden to this day, due to the rise of multi and extensively drug-resistant bacterial strains. The genome of Mycobacterium tuberculosis (Mtb) strain H37Rv underwent an annotation process that excluded small Open Reading Frames (smORFs), which encode a class of peptides and small proteins collectively known as microproteins. As a result, there is an overlooked part of its proteome that is a rich source of potentially essential, druggable molecular targets. Here, we employed  ...[more]

Similar Datasets

2017-12-20 | GSE92659 | GEO
2024-10-23 | GSE269371 | GEO
2024-10-23 | GSE269370 | GEO
2024-10-23 | GSE269374 | GEO
2024-10-23 | GSE269372 | GEO
2019-07-03 | GSE125218 | GEO
2022-12-13 | GSE198107 | GEO
2022-12-13 | GSE197909 | GEO
2022-04-21 | PXD025604 | JPOST Repository
2021-08-05 | PXD027697 |