Predicting PDZ domain-peptide interactions from primary sequences.
ABSTRACT: PDZ domains constitute one of the largest families of interaction domains and function by binding the C termini of their target proteins. Using Bayesian estimation, we constructed a three-dimensional extension of a position-specific scoring matrix that predicts to which peptides a PDZ domain will bind, given the primary sequences of the PDZ domain and the peptides. The model, which was trained using interaction data from 82 PDZ domains and 93 peptides encoded in the mouse genome, successfully predicts interactions involving other mouse PDZ domains, as well as PDZ domains from Drosophila melanogaster and, to a lesser extent, PDZ domains from Caenorhabditis elegans. The model also predicts the differential effects of point mutations in peptide ligands on their PDZ domain-binding affinities. Overall, we show that our approach captures, in a single model, the binding selectivity of the PDZ domain family.
Project description:PDZ domain is one of the abundant modular domains that recognize short peptide sequences to mediate protein-protein interactions. To decipher the binding specificity of PDZ domain, we analyzed the interactions between 11 mouse PDZ domains and 217 [corrected] peptides using a method called MIEC-SVM, which energetically characterizes the domain-peptide interaction using molecular interaction energy components (MIECs) and predicts binding specificity using support vector machine (SVM). Cross-validation and leave-one-domain-out test showed that the MIEC-SVM using all 44 PDZ-peptide residue pairs at the interaction interface outperformed the sequence-based methods in the literature. A further feature (residue pair) selection procedure illustrated that 16 residue pairs were uninformative to the binding specificity, even though they contributed significantly (~50%) to the binding energy. If only using the 28 informative residue pairs, the performance of the MIEC-SVM on predicting the PDZ binding specificity was significantly improved. This analysis suggests that the informative and uninformative residue interactions between the PDZ domain and the peptide may represent those contributing to binding specificity and affinity, respectively. We performed additional structural and energetic analyses to shed light on understanding how the PDZ-peptide recognition is established. The success of the MIEC-SVM method on PDZ domains in this study and SH3 domains in our previous studies illustrates its generality on characterizing protein-peptide interactions and understanding protein recognition from a structural and energetic viewpoint.
Project description:BACKGROUND: PDZ domain is a well-conserved, structural protein domain found in hundreds of signaling proteins that are otherwise unrelated. PDZ domains can bind to the C-terminal peptides of different proteins and act as glue, clustering different protein complexes together, targeting specific proteins and routing these proteins in signaling pathways. These domains are classified into classes I, II and III, depending on their binding partners and the nature of bonds formed. Binding specificities of PDZ domains are very crucial in order to understand the complexity of signaling pathways. It is still an open question how these domains recognize and bind their partners. RESULTS: The focus of the current study is two folds: 1) predicting to which peptides a PDZ domain will bind and 2) classification of PDZ domains, as Class I, II or I-II, given the primary sequences of the PDZ domains. Trigram and bigram amino acid frequencies are used as features in machine learning methods. Using 85 PDZ domains and 181 peptides, our model reaches high prediction accuracy (91.4%) for binary interaction prediction which outperforms previously investigated similar methods. Also, we can predict classes of PDZ domains with an accuracy of 90.7%. We propose three critical amino acid sequence motifs that could have important roles on specificity pattern of PDZ domains. CONCLUSIONS: Our model on PDZ interaction dataset shows that our approach produces encouraging results. The method can be further used as a virtual screening technique to reduce the search space for putative candidate target proteins and drug-like molecules of PDZ domains.
Project description:BACKGROUND: PDZ domains are one of the most promiscuous protein recognition modules that bind with short linear peptides and play an important role in cellular signaling. Recently, few high-throughput techniques (e.g. protein microarray screen, phage display) have been applied to determine in-vitro binding specificity of PDZ domains. Currently, many computational methods are available to predict PDZ-peptide interactions but they often provide domain specific models and/or have a limited domain coverage. RESULTS: Here, we composed the largest set of PDZ domains derived from human, mouse, fly and worm proteomes and defined binding models for PDZ domain families to improve the domain coverage and prediction specificity. For that purpose, we first identified a novel set of 138 PDZ families, comprising of 548 PDZ domains from aforementioned organisms, based on efficient clustering according to their sequence identity. For 43 PDZ families, covering 226 PDZ domains with available interaction data, we built specialized models using a support vector machine approach. The advantage of family-wise models is that they can also be used to determine the binding specificity of a newly characterized PDZ domain with sufficient sequence identity to the known families. Since most current experimental approaches provide only positive data, we have to cope with the class imbalance problem. Thus, to enrich the negative class, we introduced a powerful semi-supervised technique to generate high confidence non-interaction data. We report competitive predictive performance with respect to state-of-the-art approaches. CONCLUSIONS: Our approach has several contributions. First, we show that domain coverage can be increased by applying accurate clustering technique. Second, we developed an approach based on a semi-supervised strategy to get high confidence negative data. Third, we allowed high order correlations between the amino acid positions in the binding peptides. Fourth, our method is general enough and will easily be applicable to other peptide recognition modules such as SH2 domains and finally, we performed a genome-wide prediction for 101 human and 102 mouse PDZ domains and uncovered novel interactions with biological relevance. We make all the predictive models and genome-wide predictions freely available to the scientific community.
Project description:PDZ domains are protein-protein interaction modules that recognize specific C-terminal sequences to assemble protein complexes in multicellular organisms. By scanning billions of random peptides, we accurately map binding specificity for approximately half of the over 330 PDZ domains in the human and Caenorhabditis elegans proteomes. The domains recognize features of the last seven ligand positions, and we find 16 distinct specificity classes conserved from worm to human, significantly extending the canonical two-class system based on position -2. Thus, most PDZ domains are not promiscuous, but rather are fine-tuned for specific interactions. Specificity profiling of 91 point mutants of a model PDZ domain reveals that the binding site is highly robust, as all mutants were able to recognize C-terminal peptides. However, many mutations altered specificity for ligand positions both close and far from the mutated position, suggesting that binding specificity can evolve rapidly under mutational pressure. Our specificity map enables the prediction and prioritization of natural protein interactions, which can be used to guide PDZ domain cell biology experiments. Using this approach, we predicted and validated several viral ligands for the PDZ domains of the SCRIB polarity protein. These findings indicate that many viruses produce PDZ ligands that disrupt host protein complexes for their own benefit, and that highly pathogenic strains target PDZ domains involved in cell polarity and growth.
Project description:Predicting protein interactions involving peptide recognition domains is essential for understanding the many important biological processes they mediate. It is important to consider the binding strength of these interactions to help us construct more biologically relevant protein interaction networks that consider cellular context and competition between potential binders.We developed a novel regression framework that considers both positive (quantitative) and negative (qualitative) interaction data available for mouse PDZ domains to quantitatively predict interactions between PDZ domains, a large peptide recognition domain family, and their peptide ligands using primary sequence information. First, we show that it is possible to learn from existing quantitative and negative interaction data to infer the relative binding strength of interactions involving previously unseen PDZ domains and/or peptides given their primary sequence. Performance was measured using cross-validated hold out testing and testing with previously unseen PDZ domain-peptide interactions. Second, we find that incorporating negative data improves quantitative interaction prediction. Third, we show that sequence similarity is an important prediction performance determinant, which suggests that experimentally collecting additional quantitative interaction data for underrepresented PDZ domain subfamilies will improve prediction.The Matlab code for our SemiSVR predictor and all data used here are available at http://baderlab.org/Data/PDZAffinity.
Project description:In molecular recognition, it is often the case that ligand binding is coupled to conformational change in one or both of the binding partners. Two hypotheses describe the limiting cases involved; the first is the induced fit and the second is the conformational selection model. The conformational selection model requires that the protein adopts conformations that are similar to the ligand-bound conformation in the absence of ligand, whilst the induced-fit model predicts that the ligand-bound conformation of the protein is only accessible when the ligand is actually bound. The flexibility of the apo protein clearly plays a major role in these interpretations. For many proteins involved in signaling pathways there is the added complication that they are often promiscuous in that they are capable of binding to different ligand partners. The relationship between protein flexibility and promiscuity is an area of active research and is perhaps best exemplified by the PDZ domain family of proteins. In this study we use molecular dynamics simulations to examine the relationship between flexibility and promiscuity in five PDZ domains: the human Dvl2 (Dishevelled-2) PDZ domain, the human Erbin PDZ domain, the PDZ1 domain of InaD (inactivation no after-potential D protein) from fruit fly, the PDZ7 domain of GRIP1 (glutamate receptor interacting protein 1) from rat and the PDZ2 domain of PTP-BL (protein tyrosine phosphatase) from mouse. We show that despite their high structural similarity, the PDZ binding sites have significantly different dynamics. Importantly, the degree of binding pocket flexibility was found to be closely related to the various characteristics of peptide binding specificity and promiscuity of the five PDZ domains. Our findings suggest that the intrinsic motions of the apo structures play a key role in distinguishing functional properties of different PDZ domains and allow us to make predictions that can be experimentally tested.
Project description:PDZ domains mediate protein-protein interactions involved in important biological processes through the recognition of short linear motifs in their target proteins. Two recent independent studies have used protein microarray or phage display technology to detect PDZ domain interactions with peptide ligands on a large scale. Several computational predictors of PDZ domain interactions have been developed, however they are trained using only protein microarray data and focus on limited subsets of PDZ domains. An accurate predictor of genomic PDZ domain interactions would allow the proteomes of organisms to be scanned for potential binders. Such an application would require an accurate and precise predictor to avoid generating too many false positive hits given the large amount of possible interactors in a given proteome. Once validated these predictions will help to increase the coverage of current PDZ domain interaction networks and further our understanding of the roles that PDZ domains play in a variety of biological processes.We developed a PDZ domain interaction predictor using a support vector machine (SVM) trained with both protein microarray and phage display data. In order to use the phage display data for training, which only contains positive interactions, we developed a method to generate artificial negative interactions. Using cross-validation and a series of independent tests, we showed that our SVM successfully predicts interactions in different organisms. We then used the SVM to scan the proteomes of human, worm and fly to predict binders for several PDZ domains. Predictions were validated using known genomic interactions and published protein microarray experiments. Based on our results, new protein interactions potentially associated with Usher and Bardet-Biedl syndromes were predicted. A comparison of performance measures (F1 measure and FPR) for the SVM and published predictors demonstrated our SVM's improved accuracy and precision at proteome scanning.We built an SVM using mouse and human experimental training data to predict PDZ domain interactions. We showed that it correctly predicts known interactions from proteomes of different organisms and is more accurate and precise at proteome scanning compared with published state-of-the-art predictors.
Project description:Binding selectivity and cross-reactivity within one of the largest and most abundant interaction domain families, the PDZ family, has long been enigmatic. The complete human PDZ domain complement (the PDZome) consists of 267 domains and we applied here a Bayesian selectivity model to predict hundreds of human PDZ domain interactions, using target sequences of 22,997 non-redundant proteins. Subsequent analysis of these binding scores shows that PDZs can be divided into two genome-wide clusters that coincide well with the division between canonical class 1 and 2 PDZs. Within the class 1 PDZs we observed binding overlap at unprecedented levels, mediated by two residues at positions 1 and 5 of the second ?-helix of the binding pocket. Eight PDZ domains were subsequently selected for experimental binding studies and to verify the basics of our predictions. Overall, the PDZ domain class 1 cross-reactivity identified here implies that auxiliary mechanisms must be in place to overcome this inherent functional overlap and to minimize cross-selectivity within the living cell. Indeed, when we superimpose PDZ domain binding affinities with gene ontologies, network topology data and the domain position within a PDZ superfamily protein, functional overlap is minimized and PDZ domains position optimally in the binding space. We therefore propose that PDZ domain selectivity is achieved through cellular context rather than inherent binding specificity.
Project description:Guanine nucleotide exchange factor proteins of the Tiam family are activators of the Rho GTPase Rac1 and critical for cell morphology, adhesion, migration, and polarity. These proteins are modular and contain a variety of interaction domains, including a single post-synaptic density-95/discs large/zonula occludens-1 (PDZ) domain. Previous studies suggest that the specificities of the Tiam1 and Tiam2 PDZ domains are distinct. Here, we sought to conclusively define these specificities and determine their molecular origin. Using a combinatorial peptide library, we identified a consensus binding sequence for each PDZ domain. Analysis of these consensus sequences and binding assays with peptides derived from native proteins indicated that these two PDZ domains have overlapping but distinct specificities. We also identified residues in two regions (S(0) and S(-2) pockets) of the Tiam1 PDZ domain that are important determinants of ligand specificity. Site-directed mutagenesis of four nonconserved residues in these two regions along with peptide binding analyses confirmed that these residues are crucial for ligand affinity and specificity. Furthermore, double mutant cycle analysis of each region revealed energetic couplings that were dependent on the ligand being investigated. Remarkably, a Tiam1 PDZ domain quadruple mutant had the same specificity as the Tiam2 PDZ domain. Finally, analysis of Tiam family PDZ domain sequences indicated that the PDZ domains segregate into four distinct families based on the residues studied here. Collectively, our data suggest that Tiam family proteins have highly evolved PDZ domain-ligand interfaces with distinct specificities and that they have disparate PDZ domain-dependent biological functions.
Project description:PDZ domains are independently folded modules that typically mediate protein-protein interactions by binding to the C termini of their target proteins. However, in a few instances, PDZ domains have been reported to dimerize with other PDZ domains. To investigate this noncanonical-binding mode further, we used protein microarrays comprising virtually every mouse PDZ domain to systematically query all possible PDZ-PDZ pairs. We then used fluorescence polarization to retest and quantify interactions and coaffinity purification to test biophysically validated interactions in the context of their full-length proteins. Overall, we discovered 37 PDZ-PDZ interactions involving 46 PDZ domains (~30% of all PDZ domains tested), revealing that dimerization is a more frequently used binding mode than was previously appreciated. This suggests that many PDZ domains evolved to form multiprotein complexes by simultaneously interacting with more than one ligand.