Architecture of the human interactome defines protein communities and disease networks.
ABSTRACT: The physiology of a cell can be viewed as the product of thousands of proteins acting in concert to shape the cellular response. Coordination is achieved in part through networks of protein-protein interactions that assemble functionally related proteins into complexes, organelles, and signal transduction pathways. Understanding the architecture of the human proteome has the potential to inform cellular, structural, and evolutionary mechanisms and is critical to elucidating how genome variation contributes to disease. Here we present BioPlex 2.0 (Biophysical Interactions of ORFeome-derived complexes), which uses robust affinity purification-mass spectrometry methodology to elucidate protein interaction networks and co-complexes nucleated by more than 25% of protein-coding genes from the human genome, and constitutes, to our knowledge, the largest such network so far. With more than 56,000 candidate interactions, BioPlex 2.0 contains more than 29,000 previously unknown co-associations and provides functional insights into hundreds of poorly characterized proteins while enhancing network-based analyses of domain associations, subcellular localization, and co-complex formation. Unsupervised Markov clustering of interacting proteins identified more than 1,300 protein communities representing diverse cellular activities. Genes essential for cell fitness are enriched within 53 communities representing central cellular functions. Moreover, we identified 442 communities associated with more than 2,000 disease annotations, placing numerous candidate disease genes into a cellular framework. BioPlex 2.0 exceeds previous experimentally derived interaction networks in depth and breadth, and will be a valuable resource for exploring the biology of incompletely characterized proteins and for elucidating larger-scale patterns of proteome organization.
Project description:Protein interactions form a network whose structure drives cellular function and whose organization informs biological inquiry. Using high-throughput affinity-purification mass spectrometry, we identify interacting partners for 2,594 human proteins in HEK293T cells. The resulting network (BioPlex) contains 23,744 interactions among 7,668 proteins with 86% previously undocumented. BioPlex accurately depicts known complexes, attaining 80%-100% coverage for most CORUM complexes. The network readily subdivides into communities that correspond to complexes or clusters of functionally related proteins. More generally, network architecture reflects cellular localization, biological process, and molecular function, enabling functional characterization of thousands of proteins. Network structure also reveals associations among thousands of protein domains, suggesting a basis for examining structurally related proteins. Finally, BioPlex, in combination with other approaches, can be used to reveal interactions of biological or clinical significance. For example, mutations in the membrane protein VAPB implicated in familial amyotrophic lateral sclerosis perturb a defined community of interactors.
Project description:The development of large-scale data sets requires a new means to display and disseminate research studies to large audiences. Knowledge of protein-protein interaction (PPI) networks has become a principle interest of many groups within the field of proteomics. At the confluence of technologies, such as cross-linking mass spectrometry, yeast two-hybrid, protein cofractionation, and affinity purification mass spectrometry (AP-MS), detection of PPIs can uncover novel biological inferences at a high-throughput. Thus new platforms to provide community access to large data sets are necessary. To this end, we have developed a web application that enables exploration and dissemination of the growing BioPlex interaction network. BioPlex is a large-scale interactome data set based on AP-MS of baits from the human ORFeome. The latest BioPlex data set release (BioPlex 2.0) contains 56?553 interactions from 5891 AP-MS experiments. To improve community access to this vast compendium of interactions, we developed BioPlex Display, which integrates individual protein querying, access to empirical data, and on-the-fly annotation of networks within an easy-to-use and mobile web application. BioPlex Display enables rapid acquisition of data from BioPlex and development of hypotheses based on protein interactions.
Project description:Despite tremendous efforts in genomics, transcriptomics, and proteomics communities, there is still no comprehensive data about the exact number of protein-coding genes, translated proteoforms, and their function. In addition, by now, we lack functional annotation for 1193 genes, where expression was confirmed at the proteomic level (uPE1 proteins). We re-analyzed results of AP-MS experiments from the BioPlex 2.0 database to predict functions of uPE1 proteins and their splice forms. By building a protein-protein interaction network for 12 ths. identified proteins encoded by 11 ths. genes, we were able to predict Gene Ontology categories for a total of 387 uPE1 genes. We predicted different functions for canonical and alternatively spliced forms for four uPE1 genes. In total, functional differences were revealed for 62 proteoforms encoded by 31 genes. Based on these results, it can be carefully concluded that the dynamics and versatility of the interactome is ensured by changing the dominant splice form. Overall, we propose that analysis of large-scale AP-MS experiments performed for various cell lines and under various conditions is a key to understanding the full potential of genes role in cellular processes.
Project description:Protein-protein interactions are essential biologic processes that occur at inter- and intracellular levels. To gain insight into the various complex cellular functions of these interactions, it is necessary to assess them under physiologic conditions. Recent advances in various proteomic technologies allow to investigate protein-protein interaction networks in living cells. The combination of proximity-dependent labelling and chemical cross-linking will greatly enhance our understanding of multi-protein complexes that are difficult to prepare, such as organelle-bound membrane proteins. In this review, we describe our current understanding of mass spectrometry-based proteomics mapping methods for elucidating organelle-bound membrane protein complexes in living cells, with a focus on protein-protein interactions in mitochondrial subcellular compartments.
Project description:Affinity purification (AP) coupled to mass spectrometry (MS) has been successful in elucidating protein molecular networks of mammalian cells. These approaches have dramatically increased the knowledge of the interconnectivity present among proteins and highlighted biological functions within different protein complexes. Despite significant technical improvements reached in the past years, it is still challenging to identify the interaction networks and the subsequent associated functions of nuclear proteins such as transcription factors (TFs). A straightforward and robust methodology is therefore required to obtain unbiased and reproducible interaction data. Here we present a new approach for TF AP-MS, exemplified with the CCAAT/enhancer binding protein alpha (C/EBPalpha). Utilizing the advantages of a double tag and three different MS strategies, we conducted a total of six independent AP-MS strategies to analyze the protein-protein interactions of C/EBPalpha. The resultant data were combined to produce a cohesive C/EBPalpha interactome. Our study describes a new methodology that robustly identifies specific molecular complexes associated with transcription factors. Moreover, it emphasizes the existence of TFs as protein complexes essential for cellular biological functions and not as single, static entities.
Project description:Modularity is an attribute of a system that can be decomposed into a set of cohesive entities that are loosely coupled. Many cellular networks can be decomposed into functional modules-each functionally separable from the other modules. The protein complexes in physical protein interaction networks are a good example of this, and here we focus on their origins and evolution. We investigate the emergence of protein complexes and physical interactions between proteins by duplication, and review other mechanisms. We dissect the dataset of protein complexes of known three-dimensional structure, and show that roughly 90% of these complexes contain contacts between identical proteins within the same complex. Proteins that are shared across different complexes occur frequently, and they tend to be essential genes more often than members of a single protein complex. We also provide a perspective on the evolutionary mechanisms driving the growth of other modular cellular networks such as transcriptional regulatory and metabolic networks.
Project description:Protein-protein interaction (PPI) networks, providing a comprehensive landscape of protein interaction patterns, enable us to explore biological processes and cellular components at multiple resolutions. For a biological process, a number of proteins need to work together to perform a job. Proteins densely interact with each other, forming large molecular machines or cellular building blocks. Identification of such densely interconnected clusters or protein complexes from PPI networks enables us to obtain a better understanding of the hierarchy and organization of biological processes and cellular components. However, most existing graph clustering algorithms on PPI networks often cannot effectively detect densely connected subgraphs and overlapped subgraphs. In this article, we formulate the problem of complex detection as diversified dense subgraph mining and introduce a novel approximation algorithm to efficiently enumerate putative protein complexes from biological networks. The key insight of our algorithm is that instead of enumerating all dense subgraphs, we only need to find a small diverse subset of subgraphs that cover as many proteins as possible. The problem is modeled as finding a diverse set of maximal dense subgraphs where we develop highly effective pruning techniques to guarantee efficiency. To scale up to large networks, we devise a divide-and-conquer approach to speed up the algorithm in a distributed manner. By comparing with existing clustering and dense subgraph-based algorithms on several yeast and human PPI networks, we demonstrate that our method can detect more putative protein complexes and achieves better prediction accuracy.
Project description:Revealing functional units in protein-protein interaction (PPI) networks are important for understanding cellular functional organization. Current algorithms for identifying functional units mainly focus on cohesive protein complexes which have more internal interactions than external interactions. Most of these approaches do not handle overlaps among complexes since they usually allow a protein to belong to only one complex. Moreover, recent studies have shown that other non-cohesive structural functional units beyond complexes also exist in PPI networks. Thus previous algorithms that just focus on non-overlapping cohesive complexes are not able to present the biological reality fully. Here, we develop a new regularized sparse random graph model (RSRGM) to explore overlapping and various structural functional units in PPI networks. RSRGM is principally dominated by two model parameters. One is used to define the functional units as groups of proteins that have similar patterns of connections to others, which allows RSRGM to detect non-cohesive structural functional units. The other one is used to represent the degree of proteins belonging to the units, which supports a protein belonging to more than one revealed unit. We also propose a regularizer to control the smoothness between the estimators of these two parameters. Experimental results on four S. cerevisiae PPI networks show that the performance of RSRGM on detecting cohesive complexes and overlapping complexes is superior to that of previous competing algorithms. Moreover, RSRGM has the ability to discover biological significant functional units besides complexes.
Project description:BACKGROUND: Proteins in organisms, rather than act alone, usually form protein complexes to perform cellular functions. We analyze the topological network structure of protein complexes and their component proteins in the budding yeast in terms of the bipartite network and its projections, where the complexes and proteins are its two distinct components. Compared to conventional protein-protein interaction networks, the networks from the protein complexes show more homogeneous structures than those of the binary protein interactions, implying the formation of complexes that cause a relatively more uniform number of interaction partners. In addition, we suggest a new optimization method to determine the abundance and function of protein complexes, based on the information of their global organization. Estimating abundance and biological functions is of great importance for many researches, by providing a quantitative description of cell behaviors, instead of just a "catalogues" of the lists of protein interactions. RESULTS: With our new optimization method, we present genome-wide assignments of abundance and biological functions for complexes, as well as previously unknown abundance and functions of proteins, which can provide significant information for further investigations in proteomics. It is strongly supported by a number of biologically relevant examples, such as the relationship between the cytoskeleton proteins and signal transduction and the metabolic enzyme Eno2's involvement in the cell division process. CONCLUSIONS: We believe that our methods and findings are applicable not only to the specific area of proteomics, but also to much broader areas of systems biology with the concept of optimization principle.
Project description:Systems biology aims to understand biological phenomena in terms of complex biological and molecular interactions, and thus proteomics plays an important role in elucidating protein networks. However, many proteomic methods have suffered from their high variability, resulting in only showing altered protein names. Here, we propose a strategy for elucidating cellular protein networks based on an FD-LC-MS/MS proteomic method. The strategy permits reproducible relative quantitation of differences in protein levels between different cell populations and allows for integration of the data with those obtained through other methods. We demonstrate the validity of the approach through a comparison of differential protein expression in normal and conditional superoxide dismutase 1 gene knockout cells and believe that beginning with an FD-LC-MS/MS proteomic approach will enable researchers to elucidate protein networks more easily and comprehensively.