A structure-based classification and analysis of protein domain family binding sites and their interactions.
ABSTRACT: While the number of solved 3D protein structures continues to grow rapidly, the structural rules that distinguish protein-protein interactions between different structural families are still not clear. Here, we classify and analyse the secondary structural features and promiscuity of a comprehensive non-redundant set of domain family binding sites (DFBSs) and hetero domain-domain interactions (DDIs) extracted from our updated KBDOCK resource. We have partitioned 4001 DFBSs into five classes using their propensities for three types of secondary structural elements ("?" for helices, "?" for strands, and "?" for irregular structure) and we have analysed how frequently these classes occur in DDIs. Our results show that ? elements are not highly represented in DFBSs compared to ? and ? elements. At the DDI level, all classes of binding sites tend to preferentially bind to the same class of binding sites and ?/? contacts are significantly disfavored. Very few DFBSs are promiscuous: 80% of them interact with just one Pfam domain. About 50% of our Pfam domains bear only one single-partner DFBS and are therefore monogamous in their interactions with other domains. Conversely, promiscuous Pfam domains bear several DFBSs among which one or two are promiscuous, thereby multiplying the promiscuity of the concerned protein.
Project description:Comparing, classifying and modelling protein structural interactions can enrich our understanding of many biomolecular processes. This contribution describes Kbdock (http://kbdock.loria.fr/), a database system that combines the Pfam domain classification with coordinate data from the PDB to analyse and model 3D domain-domain interactions (DDIs). Kbdock can be queried using Pfam domain identifiers, protein sequences or 3D protein structures. For a given query domain or pair of domains, Kbdock retrieves and displays a non-redundant list of homologous DDIs or domain-peptide interactions in a common coordinate frame. Kbdock may also be used to search for and visualize interactions involving different, but structurally similar, Pfam families. Thus, structural DDI templates may be proposed even when there is little or no sequence similarity to the query domains.
Project description:Numerous eukaryotic proteins contain multiple domains. Certain domains show a tendency to occur in diverse domain architectures and can be considered "promiscuous." These promiscuous domains are, typically, involved in protein-protein interactions and play crucial roles in interaction networks, particularly those that contribute to signal transduction. A systematic comparative-genomic analysis of promiscuous domains in eukaryotes is described. Two quantitative measures of domain promiscuity are introduced and applied to the analysis of 28 genomes of diverse eukaryotes. Altogether, 215 domains are identified as strongly promiscuous. The fraction of promiscuous domains in animals is shown to be significantly greater than that in fungi or plants. Evolutionary reconstructions indicate that domain promiscuity is a volatile, relatively fast-changing feature of eukaryotic proteins, with few domains remaining promiscuous throughout the evolution of eukaryotes. Some domains appear to have attained promiscuity independently in different lineages, for example, animals and plants. It is proposed that promiscuous domains persist within a relatively small pool of evolutionarily stable domain combinations from which numerous rare architectures emerge during evolution. Domain promiscuity positively correlates with the number of experimentally detected domain interactions and with the strength of purifying selection affecting a domain. Thus, evolution of promiscuous domains seems to be constrained by the diversity of their interaction partners. The set of promiscuous domains is enriched for domains mediating protein-protein interactions that are involved in various forms of signal transduction, especially in the ubiquitin system and in chromatin. Thus, a limited repertoire of promiscuous domains makes a major contribution to the diversity and evolvability of eukaryotic proteomes and signaling networks.
Project description:DOMINE is a comprehensive collection of known and predicted domain-domain interactions (DDIs) compiled from 15 different sources. The updated DOMINE includes 2285 new domain-domain interactions (DDIs) inferred from experimentally characterized high-resolution three-dimensional structures, and about 3500 novel predictions by five computational approaches published over the last 3 years. These additions bring the total number of unique DDIs in the updated version to 26,219 among 5140 unique Pfam domains, a 23% increase compared to 20,513 unique DDIs among 4346 unique domains in the previous version. The updated version now contains 6634 known DDIs, and features a new classification scheme to assign confidence levels to predicted DDIs. DOMINE will serve as a valuable resource to those studying protein and domain interactions. Most importantly, DOMINE will not only serve as an excellent reference to bench scientists testing for new interactions but also to bioinformaticans seeking to predict novel protein-protein interactions based on the DDIs. The contents of the DOMINE are available at http://domine.utdallas.edu.
Project description:In this work, we describe a systematic comparative genomic analysis of promiscuous domains in genomes of Bacteria and Archaea. A quantitative measure of domain promiscuity, the weighted domain architecture score (WDAS), was used and applied to 1317 domains in 1320 genomes of Bacteria and Archaea. A functional analysis associated with the WDAS per genome showed that 18 of 50 functional categories were identified as significantly enriched in the promiscuous domains; in particular, small-molecule binding domains, transferases domains, DNA binding domains (transcription factors), and signal transduction domains were identified as promiscuous. In contrast, non-promiscuous domains were identified as associated with 6 of 50 functional categories, and the category Function unknown was enriched. In addition, the WDASs of 52 domains correlated with genome size, i.e., WDAS values decreased as the genome size increased, suggesting that the number of combinations at larger domains increases, including domains in the superfamilies Winged helix-turn-helix and P-loop-containing nucleoside triphosphate hydrolases. Finally, based on classification of the domains according to their ancestry, we determined that the set of 52 promiscuous domains are also ancient and abundant among all the genomes, in contrast to the non-promiscuous domains. In summary, we consider that the association between these two classes of protein domains (promiscuous and non-promiscuous) provides bacterial and archaeal cells with the ability to respond to diverse environmental challenges.
Project description:Identifying reliable domain-domain interactions will increase our ability to predict novel protein-protein interactions, to unravel interactions in protein complexes, and thus gain more information about the function and behavior of genes. One of the challenges of identifying reliable domain-domain interactions is domain promiscuity. Promiscuous domains are domains that can occur in many domain architectures and are therefore found in many proteins. This becomes a problem for a method where the score of a domain-pair is the ratio between observed and expected frequencies because the protein-protein interaction network is sparse. As such, many protein-pairs will be non-interacting and domain-pairs with promiscuous domains will be penalized. This domain promiscuity challenge to the problem of inferring reliable domain-domain interactions from protein-protein interactions has been recognized, and a number of work-arounds have been proposed. This paper reports on an application of Formal Concept Analysis to this problem. It is found that the relationship between formal concepts provides a natural way for rare domains to elevate the rank of promiscuous domain-pairs and enrich highly ranked domain-pairs with reliable domain-domain interactions. This piggybacking of promiscuous domain-pairs onto less promiscuous domain-pairs is possible only with concept lattices whose attribute-labels are not reduced and is enhanced by the presence of proteins that comprise both promiscuous and rare domains.
Project description:Most eukaryotic proteins are composed of two or more domains. These assemble in a modular manner to create new proteins usually by the acquisition of one or more domains to an existing protein. Promiscuous domains which are found embedded in a variety of proteins and co-exist with many other domains are of particular interest and were shown to have roles in signaling pathways and mediating network communication. The evolution of domain promiscuity is still an open problem, mostly due to the lack of sequenced ancestral genomes. Here we use inferred domain architectures of ancestral genomes to trace the evolution of domain promiscuity in eukaryotic genomes. We find an increase in average promiscuity along many branches of the eukaryotic tree. Moreover, domain promiscuity can proceed at almost a steady rate over long evolutionary time or exhibit lineage-specific acceleration. We also observe that many signaling and regulatory domains gained domain promiscuity around the Bilateria divergence. In addition we show that those domains that played a role in the creation of two body axes and existed before the divergence of the bilaterians from fungi/metazoan achieve a boost in their promiscuities during the bilaterian evolution.
Project description:The metabolic pathways that carry out the biochemical transformations sustaining life depend on the efficiency of their associated enzymes. In recent years, it has become clear that promiscuous enzymes have played an important role in the function and evolution of metabolism. In this work we analyze the repertoire of promiscuous enzymes in 89 non-redundant genomes of the Archaea cellular domain. Promiscuous enzymes are defined as those proteins with two or more different Enzyme Commission (E.C.) numbers, according the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. From this analysis, it was found that the fraction of promiscuous enzymes is lower in Archaea than in Bacteria. A greater diversity of superfamily domains is associated with promiscuous enzymes compared to specialized enzymes, both in Archaea and Bacteria, and there is an enrichment of substrate promiscuity rather than catalytic promiscuity in the archaeal enzymes. Finally, the presence of promiscuous enzymes in the metabolic pathways was found to be heterogeneously distributed at the domain level and in the phyla that make up the Archaea. These analyses increase our understanding of promiscuous enzymes and provide additional clues to the evolution of metabolism in Archaea.
Project description:Since proteins evolve by divergent evolution, proteins with distant homology to each other may or may not bear similar functions. Improved computational approaches are required to recognize distant homologues that are functionally similar. One of the methods of assigning function to sequences is to use profiles derived from sequences of known structure. We describe an update of the Genomic Distribution of protein structural domain Superfamilies (GenDiS) database, namely GenDiS+, which provides a projection of SCOP superfamily members on the sequence space (NR database, NCBI). The sequences are validated using structure-based sequence alignment profiles and domain and full-length sequence alignments. GenDiS+ is a `tour de force' for detecting homologues within around 160 000 taxonomic identifiers, starting from nearly 11 000 domains of known structure. Features, like full-sequence alignment and phylogeny, domain sequence alignment and phylogeny, list of associated structural and sequence domains with strength of interactions, links to databases like Pfam, UniProt and ModBase and list of sequences with a PDB structure, are provided.
Project description:BACKGROUND: Conserved domains are recognized as the building blocks of eukaryotic proteins. Domains showing a tendency to occur in diverse combinations ('promiscuous' domains) are involved in versatile architectures in proteins with different functions. Current models, based on global-level analyses of domain combinations in multiple genomes, have suggested that the propensity of some domains to associate with other domains in high-level architectures increases with organismal complexity. Alternative models using domain-based phylogenetic trees propose that domains have become promiscuous independently in different lineages through convergent evolution and are, thus, random with no functional or structural preferences. Here we test whether complex protein architectures have occurred by accretion from simpler systems and whether the appearance of multidomain combinations parallels organismal complexity. As a model, we analyze the modular evolution of the PWWP domain and ask whether its appearance in combinations with other domains into multidomain architectures is linked with the occurrence of more complex life-forms. Whether high-level combinations of domains are conserved and transmitted as stable units (cassettes) through evolution is examined in the genomes of plant or metazoan species selected for their established position in the evolution of the respective lineages. RESULTS: Using the domain-tree approach, we analyze the evolutionary origins and distribution patterns of the promiscuous PWWP domain to understand the principles of its modular evolution and its existence in combination with other domains in higher-level protein architectures. We found that as a single module the PWWP domain occurs only in proteins with a limited, mainly, species-specific distribution. Earlier, it was suggested that domain promiscuity is a fast-changing (volatile) feature shaped by natural selection and that only a few domains retain their promiscuity status throughout evolution. In contrast, our data show that most of the multidomain PWWP combinations in extant multicellular organisms (humans or land plants) are present in their unicellular ancestral relatives suggesting they have been transmitted through evolution as conserved linear arrangements ('cassettes'). Among the most interesting biologically relevant results is the finding that the genes of the two plant Trithorax family subgroups (ATX1/2 and ATX3/4/5) have different phylogenetic origins. The two subgroups occur together in the earliest land plants Physcomitrella patens and Selaginella moellendorffii. CONCLUSION: Gain/loss of a single PWWP domain is observed throughout evolution reflecting dynamic lineage- or species-specific events. In contrast, higher-level protein architectures involving the PWWP domain have survived as stable arrangements driven by evolutionary descent. The association of PWWP domains with the DNA methyltransferases in O. tauri and in the metazoan lineage seems to have occurred independently consistent with convergent evolution. Our results do not support models wherein more complex protein architectures involving the PWWP domain occur with the appearance of more evolutionarily advanced life forms.
Project description:In molecular recognition, it is often the case that ligand binding is coupled to conformational change in one or both of the binding partners. Two hypotheses describe the limiting cases involved; the first is the induced fit and the second is the conformational selection model. The conformational selection model requires that the protein adopts conformations that are similar to the ligand-bound conformation in the absence of ligand, whilst the induced-fit model predicts that the ligand-bound conformation of the protein is only accessible when the ligand is actually bound. The flexibility of the apo protein clearly plays a major role in these interpretations. For many proteins involved in signaling pathways there is the added complication that they are often promiscuous in that they are capable of binding to different ligand partners. The relationship between protein flexibility and promiscuity is an area of active research and is perhaps best exemplified by the PDZ domain family of proteins. In this study we use molecular dynamics simulations to examine the relationship between flexibility and promiscuity in five PDZ domains: the human Dvl2 (Dishevelled-2) PDZ domain, the human Erbin PDZ domain, the PDZ1 domain of InaD (inactivation no after-potential D protein) from fruit fly, the PDZ7 domain of GRIP1 (glutamate receptor interacting protein 1) from rat and the PDZ2 domain of PTP-BL (protein tyrosine phosphatase) from mouse. We show that despite their high structural similarity, the PDZ binding sites have significantly different dynamics. Importantly, the degree of binding pocket flexibility was found to be closely related to the various characteristics of peptide binding specificity and promiscuity of the five PDZ domains. Our findings suggest that the intrinsic motions of the apo structures play a key role in distinguishing functional properties of different PDZ domains and allow us to make predictions that can be experimentally tested.