Prediction of scaffold proteins based on protein interaction and domain architectures.
ABSTRACT: Scaffold proteins are known for being crucial regulators of various cellular functions by assembling multiple proteins involved in signaling and metabolic pathways. Identification of scaffold proteins and the study of their molecular mechanisms can open a new aspect of cellular systemic regulation and the results can be applied in the field of medicine and engineering. Despite being highlighted as the regulatory roles of dozens of scaffold proteins, there was only one known computational approach carried out so far to find scaffold proteins from interactomes. However, there were limitations in finding diverse types of scaffold proteins because their criteria were restricted to the classical scaffold proteins. In this paper, we will suggest a systematic approach to predict massive scaffold proteins from interactomes and to characterize the roles of scaffold proteins comprehensively.From a total of 10,419 basic scaffold protein candidates in protein interactomes, we classified them into three classes according to the structural evidences for scaffolding, such as domain architectures, domain interactions and protein complexes. Finally, we could define 2716 highly reliable scaffold protein candidates and their characterized functional features. To assess the accuracy of our prediction, the gold standard positive and negative data sets were constructed. We prepared 158 gold standard positive data and 844 gold standard negative data based on the functional information from Gene Ontology consortium. The precision, sensitivity and specificity of our testing was 80.3, 51.0, and 98.5 % respectively. Through the function enrichment analysis of highly reliable scaffold proteins, we could confirm the significantly enriched functions that are related to scaffold protein binding. We also identified functional association between scaffold proteins and their recruited proteins. Furthermore, we checked that the disease association of scaffold proteins is higher than kinases.In conclusion, we could predict larger volume of scaffold proteins and analyzed their functional characteristics. Deeper understandings about the roles of scaffold proteins from this study will provide a higher opportunity to find therapeutic or engineering applications of scaffold proteins using their functional characteristics.
Project description:<h4>Background</h4>The assembly of reliable and complete protein-protein interaction (PPI) maps remains one of the significant challenges in systems biology. Computational methods which integrate and prioritize interaction data can greatly aid in approaching this goal.<h4>Results</h4>We developed a Bayesian inference framework which uses phylogenetic relationships to guide the integration of PPI evidence across multiple datasets and species, providing more accurate predictions. We apply our framework to reconcile seven eukaryotic interactomes: H. sapiens, M. musculus, R. norvegicus, D. melanogaster, C. elegans, S. cerevisiae and A. thaliana. Comprehensive GO-based quality assessment indicates a 5% to 44% score increase in predicted interactomes compared to the input data. Further support is provided by gold-standard MIPS, CYC2008 and HPRD datasets. We demonstrate the ability to recover known PPIs in well-characterized yeast and human complexes (26S proteasome, endosome and exosome) and suggest possible new partners interacting with the putative SWI/SNF chromatin remodeling complex in A. thaliana.<h4>Conclusion</h4>Our phylogeny-guided approach compares favorably to two standard methods for mapping PPIs across species. Detailed analysis of predictions in selected functional modules uncovers specific PPI profiles among homologous proteins, establishing interaction-based partitioning of protein families. Provided evidence also suggests that interactions within core complex subunits are in general more conserved and easier to transfer accurately to other organisms, than interactions between these subunits.
Project description:Salmonellosis caused by Salmonella bacteria is a food-borne disease and a worldwide health threat causing millions of infections and thousands of deaths every year. This pathogen infects an unusually broad range of host organisms including human and plants. A better understanding of the mechanisms of communication between Salmonella and its hosts requires identifying the interactions between Salmonella and host proteins. Protein-protein interactions (PPIs) are the fundamental building blocks of communication. Here, we utilize the prediction platform BIANA to obtain the putative Salmonella-human and Salmonella-Arabidopsis interactomes based on sequence and domain similarity to known PPIs. A gold standard list of Salmonella-host PPIs served to validate the quality of the human model. 24,726 and 10,926 PPIs comprising interactions between 38 and 33 Salmonella effectors and virulence factors with 9,740 human and 4,676 Arabidopsis proteins, respectively, were predicted. Putative hub proteins could be identified, and parallels between the two interactomes were discovered. This approach can provide insight into possible biological functions of so far uncharacterized proteins. The predicted interactions are available via a web interface which allows filtering of the database according to parameters provided by the user to narrow down the list of suspected interactions. The interactions are available via a web interface at http://sbi.imim.es/web/SHIPREC.php.
Project description:BACKGROUND:Databases of literature-curated protein-protein interactions (PPIs) are often used to interpret high-throughput interactome mapping studies and estimate error rates. These databases combine interactions across thousands of published studies and experimental techniques. Because the tendency for two proteins to interact depends on the local conditions, this heterogeneity of conditions means that only a subset of database PPIs are interacting during any given experiment. A typical use of these databases as gold standards in interactome mapping projects, however, assumes that PPIs included in the database are indeed interacting under the experimental conditions of the study. RESULTS:Using raw data from 20 co-fractionation experiments and six published interactomes, we demonstrate that this assumption is often false, with up to 55% of purported gold standard interactions showing no evidence of interaction, on average. We identify a subset of CORUM database complexes that do show consistent evidence of interaction in co-fractionation studies, and we use this subset as gold standards to dramatically improve interactome mapping as judged by the number of predicted interactions at a given error rate. CONCLUSIONS:We recommend using this CORUM subset as the gold standard set in future co-fractionation studies. More generally, we recommend using the subset of literature-curated PPIs that are specific to the experimental context whenever possible.
Project description:MOTIVATION:Protein-protein interactions (PPIs) are usually modeled as networks. These networks have extensively been studied using graphlets, small induced subgraphs capturing the local wiring patterns around nodes in networks. They revealed that proteins involved in similar functions tend to be similarly wired. However, such simple models can only represent pairwise relationships and cannot fully capture the higher-order organization of protein interactomes, including protein complexes. RESULTS:To model the multi-scale organization of these complex biological systems, we utilize simplicial complexes from computational geometry. The question is how to mine these new representations of protein interactomes to reveal additional biological information. To address this, we define simplets, a generalization of graphlets to simplicial complexes. By using simplets, we define a sensitive measure of similarity between simplicial complex representations that allows for clustering them according to their data types better than clustering them by using other state-of-the-art measures, e.g. spectral distance, or facet distribution distance. We model human and baker's yeast protein interactomes as simplicial complexes that capture PPIs and protein complexes as simplices. On these models, we show that our newly introduced simplet-based methods cluster proteins by function better than the clustering methods that use the standard PPI networks, uncovering the new underlying functional organization of the cell. We demonstrate the existence of the functional geometry in the protein interactome data and the superiority of our simplet-based methods to effectively mine for new biological information hidden in the complexity of the higher-order organization of protein interactomes. AVAILABILITY AND IMPLEMENTATION:Codes and datasets are freely available at http://www0.cs.ucl.ac.uk/staff/natasa/Simplets/. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:The availability of large-scale protein-protein interaction networks for numerous organisms provides an opportunity to comprehensively analyze whether simple properties of proteins are predictive of the roles they play in the functional organization of the cell. We begin by re-examining an influential but controversial characterization of the dynamic modularity of the S. cerevisiae interactome that incorporated gene expression data into network analysis. We analyse the protein-protein interaction networks of five organisms, S. cerevisiae, H. sapiens, D. melanogaster, A. thaliana, and E. coli, and confirm significant and consistent functional and structural differences between hub proteins that are co-expressed with their interacting partners and those that are not, and support the view that the former tend to be intramodular whereas the latter tend to be intermodular. However, we also demonstrate that in each of these organisms, simple topological measures are significantly correlated with the average co-expression of a hub with its partners, independent of any classification, and therefore also reflect protein intra- and inter- modularity. Further, cross-interactomic analysis demonstrates that these simple topological characteristics of hub proteins tend to be conserved across organisms. Overall, we give evidence that purely topological features of static interaction networks reflect aspects of the dynamics and modularity of interactomes as well as previous measures incorporating expression data, and are a powerful means for understanding the dynamic roles of hubs in interactomes.
Project description:RNA-protein interactions are integral to the regulation of gene expression. RNAs have diverse functions and the protein interactomes of individual RNAs vary temporally, spatially, and with physiological context. These factors make the global acquisition of individual RNA-protein interactomes an essential endeavor. Although techniques have been reported for discovery of the protein interactomes of specific RNAs they are largely laborious, costly, and accomplished singly in individual experiments. We developed HyPR-MS for the discovery and analysis of the protein interactomes of multiple RNAs in a single experiment while also reducing design time and improving efficiencies. Presented here is the application of HyPR-MS to simultaneously and selectively isolate the interactomes of lncRNAs MALAT1, NEAT1, and NORAD. Our analysis features the proteins that potentially contribute to both known and previously undiscovered roles of each lncRNA. This platform provides a powerful new multiplexing tool for the efficient and cost-effective elucidation of specific RNA-protein interactomes.
Project description:Recent molecular genetic studies have identified 100s of risk genes for various neurodevelopmental and neuropsychiatric disorders. As the number of risk genes increases, it is becoming clear that different mutations of a single gene could cause different types of disorders. One of the best examples of such a gene is SHANK3, which encodes a core scaffold protein of the neuronal excitatory post-synapse. Deletions, duplications, and point mutations of SHANK3 are associated with autism spectrum disorders, intellectual disability, schizophrenia, bipolar disorder, and attention deficit hyperactivity disorder. Nevertheless, how the different mutations of SHANK3 can lead to such phenotypic diversity remains largely unknown. In this study, we investigated whether Shank3 could form protein complexes in a brain region-specific manner, which might contribute to the heterogeneity of neuronal pathophysiology caused by SHANK3 mutations. To test this, we generated a medial prefrontal cortex (mPFC) Shank3 in vivo interactome consisting of 211 proteins, and compared this protein list with a Shank3 interactome previously generated from mixed hippocampal and striatal (HP+STR) tissues. Unexpectedly, we found that only 47 proteins (about 20%) were common between the two interactomes, while 164 and 208 proteins were specifically identified in the mPFC and HP+STR interactomes, respectively. Each of the mPFC- and HP+STR-specific Shank3 interactomes represents a highly interconnected network. Upon comparing the brain region-enriched proteomes, we found that the large difference between the mPFC and HP+STR Shank3 interactomes could not be explained by differential protein expression profiles among the brain regions. Importantly, bioinformatic pathway analysis revealed that the representative biological functions of the mPFC- and HP+STR-specific Shank3 interactomes were different, suggesting that these interactors could mediate the brain region-specific functions of Shank3. Meanwhile, the same analysis on the common Shank3 interactors, including Homer and GKAP/SAPAP proteins, suggested that they could mainly function as scaffolding proteins at the post-synaptic density. Lastly, we found that the mPFC- and HP+STR-specific Shank3 interactomes contained a significant number of proteins associated with neurodevelopmental and neuropsychiatric disorders. These results suggest that Shank3 can form protein complexes in a brain region-specific manner, which might contribute to the pathophysiological and phenotypic diversity of disorders related to SHANK3 mutations.
Project description:The postsynaptic density (PSD) contains a collection of scaffold proteins used for assembling synaptic signaling complexes. However, it is not known how the core-scaffold machinery associates in protein-interaction networks or how proteins encoded by genes involved in complex brain disorders are distributed through spatiotemporal protein complexes. Here using immunopurification, proteomics and bioinformatics, we isolated 2,876 proteins across 41 in vivo interactomes and determined their protein domain composition, correlation to gene expression levels and developmental integration to the PSD. We defined clusters for enrichment of schizophrenia, autism spectrum disorders, developmental delay and intellectual disability risk factors at embryonic day 14 and adult PSD in mice. Mutations in highly connected nodes alter protein-protein interactions modulating macromolecular complexes enriched in disease risk candidates. These results were integrated into a software platform, Synaptic Protein/Pathways Resource (SyPPRes), enabling the prioritization of disease risk factors and their placement within synaptic protein interaction networks.
Project description:BACKGROUND:Radiologically-confirmed pneumonia (RCP) is a specific end-point used in trials of Pneumococcal Conjugate Vaccine (PCV) to estimate vaccine efficacy. However, chest radiograph (CXR) interpretation varies within and between readers. We measured the repeatability and reliability of paediatric CXR interpretation using percent agreement and Cohen's Kappa and the validity of field readings against expert review in a study of the impact of PCV on pneumonia. METHODS:CXRs were obtained from 2716 children admitted between 2006 and 2014 to Kilifi County Hospital, Kilifi, Kenya, with clinically-defined severe or very-severe pneumonia. Five clinicians and radiologists attended a three-day training course on CXR interpretation using a WHO standard. All CXRs were read once by two local primary readers. Discordant readings and 13% of concordant readings were arbitrated by a panel of three expert radiologists. To assess repeatability, a 5% median random sample was presented twice. Sensitivity and specificity of the primary readers' interpretations was estimated against the 'gold-standard' of the arbitrators' results. RESULTS:Of 2716 CXRs, 2 were uninterpretable and 159 were evaluated twice. The percent agreement and Kappa for RCP were 89% and 0.68 and ranged between 84-97% and 0.19-0.68, respectively, for all pathological findings. Intra-observer repeatability was similar to inter-observer reliability. Sensitivities of the primary readers to detect RCP were 69% and 73%; specificities were 96% and 95%. CONCLUSION:Intra- and inter-observer agreements on interpretations of radiologically-confirmed pneumonia are fair to good. Reasonable sensitivity and high specificity make radiologically-confirmed pneumonia, determined in the field, a suitable measure of relative vaccine effectiveness.
Project description:Bacterial outer membrane proteins, along with a filling lipid molecule can be modified to form stable self-assembled monolayers on gold. The transmembrane domain of Escherichia coli outer membrane protein A has been engineered to create a scaffold protein to which functional motifs can be fused. In earlier work we described the assembly and structure of an antibody-binding array where the Z domain of Staphylococcus aureus protein A was fused to the scaffold protein. Whilst the binding of rabbit polyclonal immunoglobulin G (IgG) to the array is very strong, mouse monoclonal IgG dissociates from the array easily. This is a problem since many immunodiagnostic tests rely upon the use of mouse monoclonal antibodies. Here we describe a strategy to develop an antibody-binding array that will bind mouse monoclonal IgG with lowered dissociation from the array. A novel protein consisting of the scaffold protein fused to two pairs of Z domains separated by a long flexible linker was manufactured. Using surface plasmon resonance the self-assembly of the new protein on gold and the improved binding of mouse monoclonal IgG were demonstrated.