Detection of protein complexes from affinity purification/mass spectrometry data.
ABSTRACT: BACKGROUND: Recent advances in molecular biology have led to the accumulation of large amounts of data on protein-protein interaction networks in different species. An important challenge for the analysis of these data is to extract functional modules such as protein complexes and biological processes from networks which are characterised by the present of a significant number of false positives. Various computational techniques have been applied in recent years. However, most of them treat protein interaction as binary. Co-complex relations derived from affinity purification/mass spectrometry (AP-MS) experiments have been largely ignored. METHODS: This paper presents a new algorithm for detecting protein complexes from AP-MS data. The algorithm intends to detect groups of prey proteins that are significantly co-associated with the same set of bait proteins. We first construct AP-MS data as a bipartite network, where one set of nodes consists of bait proteins and the other set is composed of prey proteins. We then calculate pair-wise similarities of bait proteins based on the number of their commonly shared neighbours. A hierarchical clustering algorithm is employed to cluster bait proteins based on the similarities and thus a set of 'seed' clusters is obtained. Starting from these 'seed' clusters, an expansion process is developed to identify prey proteins which are significantly associated with the same set of bait proteins. Then, a set of complete protein complexes is derived. In application to two real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes and well-characterized cellular component annotation from Gene Ontology (GO). Several statistical metrics have been applied for evaluation. RESULTS: Experimental results show that, the proposed algorithm achieves significant improvement in detecting protein complexes from AP-MS data. In comparison to the well-known MCL algorithm, our algorithm improves the accuracy rate by about 20% in detecting protein complexes in both networks and increases the F-Measure value by about 50% in Krogan_2006 network. Greater precision and better accuracy have been achieved and the identified complexes are demonstrated to match well with existing curated protein complexes. CONCLUSIONS: Our study highlights the significance of taking co-complex relations into account when extracting protein complexes from AP-MS data. The algorithm proposed in this paper can be easily extended to the analysis of other biological networks which can be conveniently represented by bipartite graphs such as drug-target networks.
Project description:Large-scale protein-protein interaction data sets have been generated for several species including yeast and human and have enabled the identification, quantification, and prediction of cellular molecular networks. Affinity purification-mass spectrometry (AP-MS) is the preeminent methodology for large-scale analysis of protein complexes, performed by immunopurifying a specific "bait" protein and its associated "prey" proteins. The analysis and interpretation of AP-MS data sets is, however, not straightforward. In addition, although yeast AP-MS data sets are relatively comprehensive, current human AP-MS data sets only sparsely cover the human interactome. Here we develop a framework for analysis of AP-MS data sets that addresses the issues of noise, missing data, and sparsity of coverage in the context of a current, real world human AP-MS data set. Our goal is to extend and increase the density of the known human interactome by integrating bait-prey and cocomplexed preys (prey-prey associations) into networks. Our framework incorporates a score for each identified protein, as well as elements of signal processing to improve the confidence of identified protein-protein interactions. We identify many protein networks enriched in known biological processes and functions. In addition, we show that integrated bait-prey and prey-prey interactions can be used to refine network topology and extend known protein networks.
Project description:Affinity purification followed by mass spectrometry (AP-MS) has become a common approach for identifying protein-protein interactions (PPIs) and complexes. However, data analysis and visualization often rely on generic approaches that do not take advantage of the quantitative nature of AP-MS. We present a novel computational method, nested clustering, for biclustering of label-free quantitative AP-MS data. Our approach forms bait clusters based on the similarity of quantitative interaction profiles and identifies submatrices of prey proteins showing consistent quantitative association within bait clusters. In doing so, nested clustering effectively addresses the problem of overrepresentation of interactions involving baits proteins as compared with proteins only identified as preys. The method does not require specification of the number of bait clusters, which is an advantage against existing model-based clustering methods. We illustrate the performance of the algorithm using two published intermediate scale human PPI data sets, which are representative of the AP-MS data generated from mammalian cells. We also discuss general challenges of analyzing and interpreting clustering results in the context of AP-MS data.
Project description:Affinity-purification mass spectrometry (AP-MS) is the preeminent technique for identification of eukaryotic protein complexes in vivo. AP-MS workflows typically express epitope-tagged bait proteins, immunopurify, and then identify associated protein complexes using mass spectrometry. However, challenges of existing strategies include the construction of expression vectors for large open reading frames and the possibility that overexpression of bait proteins may result in expression of nonphysiological levels of the bait protein with concomitant perturbation of endogenous protein complexes. To address these issues, we use human cell lines with epitope-tagged endogenous genes as AP-MS substrates to develop a platform that we call "knock-in AP-MS", thereby avoiding the challenges of expression vector construction and ensuring that expression of tagged proteins is driven by endogenous regulatory mechanisms. Using three different bait genes (MRE11A, DNMT1 and APC), we show that cell lines expressing epitope-tagged endogenous genes make good substrates for sensitive and reproducible identification of protein interactions using AP-MS. In particular, we identify novel interactors of the important oncoprotein Adenomatous Polyposis Coli (APC), including an interaction with Flightless-1 homologue (FLII) that is enriched in nuclear fractions.
Project description:<h4>Background</h4>Affinity-Purification Mass-Spectrometry (AP-MS) provides a powerful means of identifying protein complexes and interactions. Several important challenges exist in interpreting the results of AP-MS experiments. First, the reproducibility of AP-MS experimental replicates can be low, due both to technical variability and the dynamic nature of protein interactions in the cell. Second, the identification of true protein-protein interactions in AP-MS experiments is subject to inaccuracy due to high false negative and false positive rates. Several experimental approaches can be used to mitigate these drawbacks, including the use of replicated and control experiments and relative quantification to sensitively distinguish true interacting proteins from false ones.<h4>Methods</h4>To address the issues of reproducibility and accuracy of protein-protein interactions, we introduce a two-step method, called ROCS, which makes use of Indicator Prey Proteins to select reproducible AP-MS experiments, and of Confidence Scores to select specific protein-protein interactions. The Indicator Prey Proteins account for measures of protein identifiability as well as protein reproducibility, effectively allowing removal of outlier experiments that contribute noise and affect downstream inferences. The filtered set of experiments is then used in the Protein-Protein Interaction (PPI) scoring step. Prey protein scoring is done by computing a Confidence Score, which accounts for the probability of occurrence of prey proteins in the bait experiments relative to the control experiment, where the significance cutoff parameter is estimated by simultaneously controlling false positives and false negatives against metrics of false discovery rate and biological coherence respectively. In summary, the ROCS method relies on automatic objective criterions for parameter estimation and error-controlled procedures.<h4>Results</h4>We illustrate the performance of our method by applying it to five previously published AP-MS experiments, each containing well characterized protein interactions, allowing for systematic benchmarking of ROCS. We show that our method may be used on its own to make accurate identification of specific, biologically relevant protein-protein interactions, or in combination with other AP-MS scoring methods to significantly improve inferences.<h4>Conclusions</h4>Our method addresses important issues encountered in AP-MS datasets, making ROCS a very promising tool for this purpose, either on its own or in conjunction with other methods. We anticipate that our methodology may be used more generally in proteomics studies and databases, where experimental reproducibility issues arise. The method is implemented in the R language, and is available as an R package called "ROCS", freely available from the CRAN repository http://cran.r-project.org/.
Project description:It remains a significant challenge to define individual protein associations within networks where an individual protein can directly interact with other proteins and/or be part of large complexes, which contain functional modules. Here we demonstrate the topological scoring (TopS) algorithm for the analysis of quantitative proteomic datasets from affinity purifications. Data is analyzed in a parallel fashion where a prey protein is scored in an individual affinity purification by aggregating information from the entire dataset. Topological scores span a broad range of values indicating the enrichment of an individual protein in every bait protein purification. TopS is applied to interaction networks derived from human DNA repair proteins and yeast chromatin remodeling complexes. TopS highlights potential direct protein interactions and modules within complexes. TopS is a rapid method for the efficient and informative computational analysis of datasets, is complementary to existing analysis pipelines, and provides important insights into protein interaction networks.
Project description:Although a variety of affinity purification mass spectrometry (AP-MS) strategies have been used to investigate complex interactions, many of these are susceptible to artifacts because of substantial overexpression of the exogenously expressed bait protein. Here we present a logical and systematic workflow that uses the multifunctional Halo tag to assess the correct localization and behavior of tagged subunits of the Sin3 histone deacetylase complex prior to further AP-MS analysis. Using this workflow, we modified our tagging/expression strategy with 21.7% of the tagged bait proteins that we constructed, allowing us to quickly develop validated reagents. Specifically, we apply the workflow to map interactions between stably expressed versions of the Sin3 subunits SUDS3, SAP30, or SAP30L and other cellular proteins. Here we show that the SAP30 and SAP30L paralogues strongly associate with the core Sin3 complex, but SAP30L has unique associations with the proteasome and the myelin sheath. Next, we demonstrate an advancement of the complex NSAF (cNSAF) approach, in which normalization to the scaffold protein SIN3A accounts for variations in the proportion of each bait capturing Sin3 complexes and allows a comparison among different baits capturing the same protein complex. This analysis reveals that although the Sin3 subunit SUDS3 appears to be used in both SIN3A and SIN3B based complexes, the SAP30 subunit is not used in SIN3B based complexes. Intriguingly, we do not detect the Sin3 subunits SAP18 and SAP25 among the 128 high-confidence interactions identified, suggesting that these subunits may not be common to all versions of the Sin3 complex in human cells. This workflow provides the framework for building validated reagents to assemble quantitative interaction networks for chromatin remodeling complexes and provides novel insights into focused protein interaction networks.
Project description:BACKGROUND: A typical affinity purification coupled to mass spectrometry (AP-MS) experiment includes the purification of a target protein (bait) using an antibody and subsequent mass spectrometry analysis of all proteins co-purifying with the bait (aka prey proteins). Like any other systems biology approach, AP-MS experiments generate a lot of data and visualization has been challenging, especially when integrating AP-MS experiments with orthogonal datasets. RESULTS: We present Circular Interaction Graph for Proteomics (CIG-P), which generates circular diagrams for visually appealing final representation of AP-MS data. Through a Java based GUI, the user inputs experimental and reference data as file in csv format. The resulting circular representation can be manipulated live within the GUI before exporting the diagram as vector graphic in pdf format. The strength of CIG-P is the ability to integrate orthogonal datasets with each other, e.g. affinity purification data of kinase PRPF4B in relation to the functional components of the spliceosome. Further, various AP-MS experiments can be compared to each other. CONCLUSIONS: CIG-P aids to present AP-MS data to a wider audience and we envision that the tool finds other applications too, e.g. kinase - substrate relationships as a function of perturbation. CIG-P is available under: http://sourceforge.net/projects/cig-p/
Project description:The development of affinity purification technologies together with mass spectrometric analyses of the purified protein mixtures (AP-MS) has been used both to identify new protein-protein interactions and to define the subunit composition of protein complexes. Transcription factor protein interactions, however, have not been systematically analyzed using these approaches. Here, we have investigated whether ectopic expression of an affinity tagged transcription factor as bait in AP-MS experiments perturbs gene expression in cells resulting in false positive identification of bait associated proteins when typical experimental controls are used. Using quantitative proteomics and RNA-Seq, we determined that the increase in the abundance of a set of proteins caused by overexpression of the transcription factor RelA is not sufficient for these proteins to then copurify non-specifically and be misidentified as bait associated proteins. Therefore typical controls should be sufficient and a number of different baits can be compared with a common set of controls. This is of practical interest when identifying bait interactors from a large number of different baits. As expected, we found several known RelA interactors enriched in our RelA purifications (NFêB1, NFêB2, Rel, RelB, IêBá, IêBâ and IêBå). We also found several proteins not previously described in association with RelA, including the small mitochondrial chaperone Tim13. Using a variety of biochemical approaches, we further investigated the nature of the association between Tim13 and NFêB family transcription factors. The work here therefore provides a conceptual and experimental framework for analyzing transcription faction protein interactions. Gene expression profiles were assayed in triplicate from HEK293 cells expressing either Halo-RelA, Halo-NFkB1, or Halo tag alone.
Project description:The development of affinity purification technologies together with mass spectrometric analyses of the purified protein mixtures (AP-MS) has been used both to identify new protein-protein interactions and to define the subunit composition of protein complexes. Transcription factor protein interactions, however, have not been systematically analyzed using these approaches. Here, we have investigated whether ectopic expression of an affinity tagged transcription factor as bait in AP-MS experiments perturbs gene expression in cells resulting in false positive identification of bait associated proteins when typical experimental controls are used. Using quantitative proteomics and RNA-Seq, we determined that the increase in the abundance of a set of proteins caused by overexpression of the transcription factor RelA is not sufficient for these proteins to then copurify non-specifically and be misidentified as bait associated proteins. Therefore typical controls should be sufficient and a number of different baits can be compared with a common set of controls. This is of practical interest when identifying bait interactors from a large number of different baits. As expected, we found several known RelA interactors enriched in our RelA purifications (NFêB1, NFêB2, Rel, RelB, IêBá, IêBâ and IêBå). We also found several proteins not previously described in association with RelA, including the small mitochondrial chaperone Tim13. Using a variety of biochemical approaches, we further investigated the nature of the association between Tim13 and NFêB family transcription factors. The work here therefore provides a conceptual and experimental framework for analyzing transcription faction protein interactions. Overall design: Gene expression profiles were assayed in triplicate from HEK293 cells expressing either Halo-RelA, Halo-NFkB1, or Halo tag alone.
Project description:MOTIVATION: Protein complexes are of great importance for unraveling the secrets of cellular organization and function. The AP-MS technique has provided an effective high-throughput screening to directly measure the co-complex relationship among multiple proteins, but its performance suffers from both false positives and false negatives. To computationally predict complexes from AP-MS data, most existing approaches either required the additional knowledge from known complexes (supervised learning), or had numerous parameters to tune. METHOD: In this article, we propose a novel unsupervised approach, without relying on the knowledge of existing complexes. Our method probabilistically calculates the affinity between two proteins, where the affinity score is evaluated by a co-complexed score or C2S in brief. In particular, our method measures the log-likelihood ratio of two proteins being co-complexed to being drawn randomly, and we then predict protein complexes by applying hierarchical clustering algorithm on the C2S score matrix. RESULTS: Compared with existing approaches, our approach is computationally efficient and easy to implement. It has just one parameter to set and its value has little effect on the results. It can be applied to different species as long as the AP-MS data are available. Despite its simplicity, it is competitive or superior in performance over many aspects when compared with the state-of-the-art predictions performed by supervised or unsupervised approaches.