Investigating function roles of hypothetical proteins encoded by the Mycobacterium tuberculosis H37Rv genome.
ABSTRACT: BACKGROUND:Mycobacterium tuberculosis (MTB) is a common bacterium causing tuberculosis and remains a major pathogen for mortality. Although the MTB genome has been extensively explored for two decades, the functions of 27% (1051/3906) of encoded proteins have yet to be determined and these proteins are annotated as hypothetical proteins. METHODS:We assigned functions to these hypothetical proteins using SSEalign, a newly designed algorithm utilizing structural information. A set of rigorous criteria was applied to these annotations in order to examine whether they were supported by each parameter. Virulence factors and potential drug targets were also screened among the annotated proteins. RESULTS:For 78% (823/1051) of the hypothetical proteins, we could identify homologs in Escherichia coli and Salmonella typhimurium by using SSEalign. Functional classification analysis indicated that 62.2% (512/823) of these annotated proteins were enzymes with catalytic activities and most of these annotations were supported by at least two other independent parameters. A relatively high proportion of transporter was identified in MTB genome, indicating the potential frequent transportation of frequent absorbing essential metabolites and excreting toxic materials in MTB. Twelve virulence factors and ten vaccine candidates were identified within these MTB hypothetical proteins, including two genes (rpoS and pspA) related to stress response to the host immune system. Furthermore, we have identified six novel drug target candidates among our annotated proteins, including Rv0817 and Rv2927c, which could be used for treating MTB infection. CONCLUSIONS:Our annotation of the MTB hypothetical proteins will probably serve as a useful dataset for future MTB studies.
Project description:Computational prediction of protein function is frequently error-prone and incomplete. In Mycobacterium tuberculosis (Mtb), ~25% of all genes have no predicted function and are annotated as hypothetical proteins, severely limiting our understanding of Mtb pathogenicity. Here, we utilize a high-throughput quantitative activity-based protein profiling (ABPP) platform to probe, annotate, and validate ATP-binding proteins in Mtb. We experimentally validate prior in silico predictions of >240 proteins and identify 72 hypothetical proteins as ATP binders. ATP interacts with proteins with diverse and unrelated sequences, providing an expanded view of adenosine nucleotide binding in Mtb. Several hypothetical ATP binders are essential or taxonomically limited, suggesting specialized functions in mycobacterial physiology and pathogenicity.
Project description:UNLABELLED:Development of an effective vaccine against drug-resistant Mycobacterium tuberculosis (Mtb) is crucial for saving millions of premature deaths every year due to tuberculosis. This paper describes a web portal developed for assisting researchers in designing vaccines against emerging Mtb strains using traditional and modern approaches. Firstly, we annotated 59 genomes of Mycobacterium species to understand similarity/dissimilarity between tuberculoid, non-tuberculoid and vaccine strains at genome level. Secondly, antigen-based vaccine candidates have been predicted in each Mtb strain. Thirdly, epitopes-based vaccine candidates were predicted/discovered in above antigen-based vaccine candidates that can stimulate all arms of immune system. Finally, a database of predicted vaccine candidates at epitopes as well at antigen level has been developed for above strains. In order to design vaccine against a newly sequenced genome of Mtb strain, server integrates three modules for identification of strain-, antigen-, epitope-specific vaccine candidates. We observed that 103,522 unique peptides (9mers) had the potential to induce an antibody response and/or promiscuous binder to MHC alleles and/or have the capability to stimulate T lymphocytes. In summary, this web-portal will be useful for researchers working on designing vaccines against Mtb including drug-resistant strains. AVAILABILITY:The database is available freely at http://crdd.osdd.net/raghava/mtbveb/.
Project description:High-throughput biology technologies have yielded complete genome sequences and functional genomics data for several organisms, including crucial microbial pathogens of humans, animals and plants. However, up to 50% of genes within a genome are often labeled "unknown", "uncharacterized" or "hypothetical", limiting our understanding of virulence and pathogenicity of these organisms. Even though biological functions of proteins encoded by these genes are not known, many of them have been predicted to be involved in key processes in these organisms. In particular, for Mycobacterium tuberculosis, some of these "hypothetical" proteins, for example those belonging to the Pro-Glu or Pro-Pro-Glu (PE/PPE) family, have been suspected to play a crucial role in the intracellular lifestyle of this pathogen, and may contribute to its survival in different environments. We have generated a functional interaction network for Mycobacterium tuberculosis proteins and used this to predict functions for many of its hypothetical proteins. Here we performed functional enrichment analysis of these proteins based on their predicted biological functions to identify annotations that are statistically relevant, and analysed and compared network properties of hypothetical proteins to the known proteins. From the statistically significant annotations and network information, we have tried to derive biologically meaningful annotations related to infection and disease. This quantitative analysis provides an overview of the functional contributions of Mycobacterium tuberculosis "hypothetical" proteins to many basic cellular functions, including its adaptability in the host system and its ability to evade the host immune response.
Project description:BACKGROUND: The genome data of Streptococcus pyogenes SF370 has been widely used by many researchers and provides a vast array of interesting findings. Nevertheless, approximately 40% of genes remain classified as hypothetical proteins, and several coding sequences (CDSs) have been unrecognized. In this study, we attempted a shotgun proteomic analysis with a six-frame database that was independent of genome annotation. RESULTS: Nine proteins encoded by novel ORFs were found by shotgun proteomic analysis, and their specific mRNAs were verified by reverse transcriptional PCR (RT-PCR). We also provided functional annotations for hypothetical genes using proteomic analysis from three different culture conditions that were separated into three fractions: supernatant, soluble, and insoluble. Consequently, we identified 567 proteins on re-evaluation of the proteomic data using an in-house database comprising 1,697 annotated and nine non-annotated CDSs. We provided functional annotations for 126 hypothetical proteins (18.9% out of the 668 hypothetical proteins) based on their cellular fractions and expression profiles under different culture conditions. CONCLUSIONS: The list of amino acid sequences that were annotated by genome analysis contains outdated information and unrecognized protein-coding sequences. We suggest that the six-frame database derived from actual DNA sequences be used for reliable proteomic analysis. In addition, the experimental evidence from functional proteomic analysis is useful for the re-evaluation of previously sequenced genomes.
Project description:A decade since the availability of Mycobacterium tuberculosis (Mtb) genome sequence, no promising drug has seen the light of the day. This not only indicates the challenges in discovering new drugs but also suggests a gap in our current understanding of Mtb biology. We attempt to bridge this gap by carrying out extensive re-annotation and constructing a systems level protein interaction map of Mtb with an objective of finding novel drug target candidates. Towards this, we synergized crowd sourcing and social networking methods through an initiative 'Connect to Decode' (C2D) to generate the first and largest manually curated interactome of Mtb termed 'interactome pathway' (IPW), encompassing a total of 1434 proteins connected through 2575 functional relationships. Interactions leading to gene regulation, signal transduction, metabolism, structural complex formation have been catalogued. In the process, we have functionally annotated 87% of the Mtb genome in context of gene products. We further combine IPW with STRING based network to report central proteins, which may be assessed as potential drug targets for development of drugs with least possible side effects. The fact that five of the 17 predicted drug targets are already experimentally validated either genetically or biochemically lends credence to our unique approach.
Project description:Research advancing our understanding of Mycobacterium tuberculosis (Mtb) biology and complex host-Mtb interactions requires consistent and precise quantitative measurements of Mtb proteins. We describe the generation and validation of a compendium of assays to quantify 97% of the 4,012 annotated Mtb proteins by the targeted mass spectrometric method selected reaction monitoring (SRM). Furthermore, we estimate the absolute abundance for 55% of all Mtb proteins, revealing a dynamic range within the Mtb proteome of over four orders of magnitude, and identify previously unannotated proteins. As an example of the assay library utility, we monitored the entire Mtb dormancy survival regulon (DosR), which is linked to anaerobic survival and Mtb persistence, and show its dynamic protein-level regulation during hypoxia. In conclusion, we present a publicly available research resource that supports the sensitive, precise, and reproducible quantification of virtually any Mtb protein by a robust and widely accessible mass spectrometric method.
Project description:Genes belonging to the same operon are transcribed as a single mRNA molecule in all prokaryotes. The genes of the same operon are presumed to be involved in similar metabolic and physiological processes. Hence, computational analysis of constituent proteins could provide important clues to the functional relationships within the operonic genes. This tends to be more fruitful in the case of Mycobacterium tuberculosis (Mtb), considering the number of hypothetical genes with unknown functions and interacting partners. Dramatic advances in the past decade have increased our knowledge of the mechanisms that tubercle bacilli employ to survive within the host. But the phenomenon of Mtb latency continues to ba?e all. Rv2031c belonging to dormancy regulon of Mtb is predominantly expressed during latency, with myriad immunological roles. Thus we attempted to analyze the operon comprising Rv2031c protein to gain insights into its role during latency. In the current study, we have carried out computational analysis of proteins encoded by genes known to be a part of this operon. Our study includes phylogenetic analysis, modeling of protein 3D structures, and protein interaction network analysis. We describe the mechanistic role in the establishment of latency and regulation of DevS-DevR component system. Additionally, we have identified the probable role of these proteins in carbohydrate metabolism, erythromycin tolerance, and nucleotide synthesis. Hence, these proteins can modulate the metabolism of Mtb inside the host cells and can be important for its survival in latency. The functional characterization and interactome of this important operon can give insight into its role during latency along with the exploitation of constituent proteins as drug targets and vaccine candidates.
Project description:To identify Mycobacterium tuberculosis (Mtb) antigens as candidates for a subunit vaccine against tuberculosis (TB), we have employed a CD4+ T-cell expression screening method. Mtb-specific CD4+ T-cell lines from nine healthy PPD positive donors were stimulated with different antigenic substrates including autologous dendritic cells (DC) infected with Mtb, or cultured with culture filtrate proteins (CFP), and purified protein derivative of Mtb (PPD). These lines were used to screen a genomic Mtb library expressed in Escherichia coli and processed and presented by autologous DC. This screening led to the recovery of numerous T-cell antigens, including both novel and previously described antigens. One of these novel antigens, referred to as Mtb9.8 (Rv0287), was recognized by multiple T-cell lines, stimulated with either Mtb-infected DC or CFP. Using the mouse and guinea pig models of TB, high levels of IFN-gamma were produced, and solid protection from Mtb challenge was observed following immunization with Mtb9.8 formulated in either AS02A or AS01B Adjuvant Systems. These results demonstrate that T-cell screening of the Mtb genome can be used to identify CD4+ T-cell antigens that are candidates for vaccine development.
Project description:BACKGROUND: While the genomic annotations of diverse lineages of the Mycobacterium tuberculosis complex are available, divergences between gene prediction methods are still a challenge for unbiased protein dataset generation. M. tuberculosis gene annotation is an example, where the most used datasets from two independent institutions (Sanger Institute and Institute of Genomic Research-TIGR) differ up to 12% in the number of annotated open reading frames, and 46% of the genes contained in both annotations have different start codons. Such differences emphasize the importance of the identification of the sequence of protein products to validate each gene annotation including its sequence coding area. RESULTS: With this objective, we submitted a culture filtrate sample from M. tuberculosis to a high-accuracy LTQ-Orbitrap mass spectrometer analysis and applied refined N-terminal prediction to perform comparison of two gene annotations. From a total of 449 proteins identified from the MS data, we validated 35 tryptic peptides that were specific to one of the two datasets, representing 24 different proteins. From those, 5 proteins were only annotated in the Sanger database. In the remaining proteins, the observed differences were due to differences in annotation of transcriptional start sites. CONCLUSION: Our results indicate that, even in a less complex sample likely to represent only 10% of the bacterial proteome, we were still able to detect major differences between different gene annotation approaches. This gives hope that high-throughput proteomics techniques can be used to improve and validate gene annotations, and in particular for verification of high-throughput, automatic gene annotations.
Project description:As ?-lactams are reconsidered for the treatment of tuberculosis (TB), their targets are assumed to be peptidoglycan transpeptidases, as verified by adduct formation and kinetic inhibition of Mycobacterium tuberculosis (Mtb) transpeptidases by carbapenems active against replicating Mtb. Here, we investigated the targets of recently described cephalosporins that are selectively active against non-replicating (NR) Mtb. NR-active cephalosporins failed to inhibit recombinant Mtb transpeptidases. Accordingly, we used alkyne analogs of NR-active cephalosporins to pull down potential targets through unbiased activity-based protein profiling and identified over 30 protein binders. None was a transpeptidase. Several of the target candidates are plausibly related to Mtb's survival in an NR state. However, biochemical tests and studies of loss of function mutants did not identify a unique target that accounts for the bactericidal activity of these beta-lactams against NR Mtb. Instead, NR-active cephalosporins appear to kill Mtb by collective action on multiple targets. These results highlight the ability of these ?-lactams to target diverse classes of proteins.