Project description:Prokaryotes are, due to their moderate complexity, particularly amenable to the comprehensive identification of the protein repertoire expressed under different conditions. We applied a generic strategy to identify a complete expressed prokaryotic proteome, which is based on the analysis of RNA and proteins extracted from matched samples. Saturated transcriptome profiling by RNA-seq provided an endpoint estimate of the protein-coding genes expressed under two conditions which mimic the interaction of Bartonella henselae with its mammalian host. Directed shotgun proteomics experiments were carried out on four subcellular fractions. By specifically targeting proteins which are short, basic, low abundant and membrane localized, we could eliminate their initial under-representation compared to the estimated endpoint. A total of 1,250 proteins were identified with an estimated false discovery rate below 1%. This represents 85% of all distinct annotated proteins and around 90% of the expressed protein-coding genes. Genes, whose transcripts were detected, but not their corresponding protein products, were found highly enriched in several genomic islands. Additionally, genes that lacked an ortholog and a functional annotation were not detected at the protein level, and possibly include over-predicted genes in genome annotations. Furthermore, a dramatic membrane proteome re-organization was observed including differential regulation of autotransporters, adhesins and hemin binding proteins. Particularly noteworthy was the complete membrane proteome coverage which included expression of all members of the VirB/D4 type IV secretion system, a key virulence factor. Transcriptome and proteome analysis of B.henselae in two conditions and duplicates: uninduced and induced for host invasion.
Project description:Using a discovery proteomics approach, the expressed proteome of Bartonella henselae, a Gram-negative prokaryotic model organism was exhaustively studied under two conditions that mimic those encountered in different hosts. Using the analysis-driven experimentation (ADE) feedback-loop strategy, we were able to virtually eliminate the biases of commonly under-represented short, basic, and particularly lower abundant and membrane protein classes, all of which are experimentally tractable. Based on a very stringent FDR at the PSM level, we identified 85% of all distinct, annotated proteins and above 90% compared to the expressed protein-coding genes in the two conditions. Several lines of evidence indicated that this is very close to all proteins that can be identified by a discovery proteomics approach with current technology Our experimental strategy relied on four elements: First, we used a combination of sub-cellular fractionation (Cyt, TM, IM, OM) with additional biochemical fractionation regimens (gelfiltration, OGEpep, OGEprot, ProteoMiner) to reduce the overall sample complexity. Second, an exclusion list approach was applied, where each sample is measured twice in a mass spectrometer. Third, we relied on the ADE strategy to target typically under-represented areas of the proteome. Finally, we also used chymotrypsin as enzyme in addition to trypsin for all membrane-derived fractions. For computational analysis, we combined results from two database search engines (Mascot and MS-GF+).Mass spectra were searched against a protein sequence database containing 1488 NCBI RefSeq annotated B. henselae proteins (NC_005956.1), 3336 sheep proteins, a positive control (myc-gfp), as well as protein sequences of 256 common contaminants. Spectra were searched against this database using the target/decoy option either with Mascot (version 2.3.0, Matrix Science) or with MS-GF+ (MS-GFDB v7747, kindly provided by Dr. Sangtae Kim, UCSD, USA) using the following parameters: Carbamidomethylation was set as a fixed modification on all Cysteines, oxidation of Methionines, deamidation of Asparagines and Glutamines, as well as cyclization of N-terminal Glutamines were considered as optional modifications. Spectra were searched for a match to fully-tryptic and semi-tryptic peptides with up to two missed cleavage sites. Precursor ion mass tolerance was set to 5 ppm, fragment ion mass tolerance was set to 0.8 Da, and the automatic decoy search option was enabled. For Mascot, data were further post-processed with Percolator (Brosch et al. 2009). Additional RNA-seq data are at GEO under the accession: GSE44564.
Project description:Prokaryotes are, due to their moderate complexity, particularly amenable to the comprehensive identification of the protein repertoire expressed under different conditions. We applied a generic strategy to identify a complete expressed prokaryotic proteome, which is based on the analysis of RNA and proteins extracted from matched samples. Saturated transcriptome profiling by RNA-seq provided an endpoint estimate of the protein-coding genes expressed under two conditions which mimic the interaction of Bartonella henselae with its mammalian host. Directed shotgun proteomics experiments were carried out on four subcellular fractions. By specifically targeting proteins which are short, basic, low abundant and membrane localized, we could eliminate their initial under-representation compared to the estimated endpoint. A total of 1,250 proteins were identified with an estimated false discovery rate below 1%. This represents 85% of all distinct annotated proteins and around 90% of the expressed protein-coding genes. Genes, whose transcripts were detected, but not their corresponding protein products, were found highly enriched in several genomic islands. Additionally, genes that lacked an ortholog and a functional annotation were not detected at the protein level, and possibly include over-predicted genes in genome annotations. Furthermore, a dramatic membrane proteome re-organization was observed including differential regulation of autotransporters, adhesins and hemin binding proteins. Particularly noteworthy was the complete membrane proteome coverage which included expression of all members of the VirB/D4 type IV secretion system, a key virulence factor.