Distance geometry generates native-like folds for small helical proteins using the consensus distances of predicted protein structures.
ABSTRACT: For successful ab initio protein structure prediction, a method is needed to identify native-like structures from a set containing both native and non-native protein-like conformations. In this regard, the use of distance geometry has shown promise when accurate inter-residue distances are available. We describe a method by which distance geometry restraints are culled from sets of 500 protein-like conformations for four small helical proteins generated by the method of Simons et al. (1997). A consensus-based approach was applied in which every inter-Calpha distance was measured, and the most frequently occurring distances were used as input restraints for distance geometry. For each protein, a structure with lower coordinate root-mean-square (RMS) error than the mean of the original set was constructed; in three cases the topology of the fold resembled that of the native protein. When the fold sets were filtered for the best scoring conformations with respect to an all-atom knowledge-based scoring function, the remaining subset of 50 structures yielded restraints of higher accuracy. A second round of distance geometry using these restraints resulted in an average coordinate RMS error of 4.38 A.
Project description:BACKGROUND: Protein-protein interactions are fundamental for the majority of cellular processes and their study is of enormous biotechnological and therapeutic interest. In recent years, a variety of computational approaches to the protein-protein docking problem have been reported, with encouraging results. Most of the currently available protein-protein docking algorithms are composed of two clearly defined parts: the sampling of the rotational and translational space of the interacting molecules, and the scoring and clustering of the resulting orientations. Although this kind of strategy has shown some of the most successful results in the CAPRI blind test http://www.ebi.ac.uk/msd-srv/capri, more efforts need to be applied. Thus, the sampling protocol should generate a pool of conformations that include a sufficient number of near-native ones, while the scoring function should discriminate between near-native and non-near-native proposed conformations. On the other hand, protocols to efficiently include full flexibility on the protein structures are increasingly needed. RESULTS: In these work we present new computational tools for protein-protein docking. We describe here the RotBUS (Rotation-Based Uniform Sampling) method to generate uniformly distributed sets of rigid-body docking poses, with a new fast calculation of the optimal contacting distance between molecules. We have tested the method on a standard benchmark of unbound structures and we can find near-native solutions in 100% of the cases. After applying a new fast filtering scheme based on residue-based desolvation, in combination with FTDock plus pyDock scoring, near-native solutions are found with rank <or= 50 in 39% of the cases. Knowledge-based experimental restraints can be easily included to reduce computational times during sampling and improve success rates, and the method can be extended in the future to include flexibility of the side-chains. CONCLUSIONS: This new sampling algorithm has the advantage of its high speed achieved by fast computing of the intermolecular distance based on a coarse representation of the interacting surfaces. In addition, a fast desolvation scoring permits the screening of millions of conformations at low computational cost, without compromising accuracy. The protocol presented here can be used as a framework to include restraints, flexibility and ensemble docking approaches.
Project description:BACKGROUND:The function of oligomeric proteins is inherently linked to their quaternary structure. In the absence of high-resolution data, low-resolution information in the form of spatial restraints can significantly contribute to the precision and accuracy of structural models obtained using computational approaches. To obtain such restraints, chemical cross-linking coupled with mass spectrometry (XL-MS) is commonly used. However, the use of XL-MS in the modeling of protein complexes comprised of identical subunits (homo-oligomers) is often hindered by the inherent ambiguity of intra- and inter-subunit connection assignment. RESULTS:We present a comprehensive evaluation of (1) different methods for inter-residue distance calculations, and (2) different approaches for the scoring of spatial restraints. Our results show that using Solvent Accessible Surface distances (SASDs) instead of Euclidean distances (EUCs) greatly reduces the assignation ambiguity and delivers better modeling precision. Furthermore, ambiguous connections should be considered as inter-subunit only when the intra-subunit alternative exceeds the distance threshold. Modeling performance can also be improved if symmetry, characteristic for most homo-oligomers, is explicitly defined in the scoring function. CONCLUSIONS:Our findings provide guidelines for proper evaluation of chemical cross-linking-based spatial restraints in modeling homo-oligomeric protein complexes, which could facilitate structural characterization of this important group of proteins.
Project description:ClusPro is a heavily used protein-protein docking server based on the fast Fourier transform (FFT) correlation approach. While FFT enables global docking, accounting for pairwise distance restraints using penalty terms in the scoring function is computationally expensive. We use a different approach and directly select low energy solutions that also satisfy the given restraints. As expected, accounting for restraints generally improves the rank of near native predictions, while retaining or even improving the numerical efficiency of FFT based docking.The software is freely available as part of the ClusPro web-based server at http://cluspro.org/nousername.php CONTACT: firstname.lastname@example.org or email@example.comSupplementary information: Supplementary data are available at Bioinformatics online.
Project description:The high-resolution three-dimensional solution structure of the plant toxin hordothionin-alpha obtained from korean barley was determined by using two-dimensional NMR techniques combined with distance geometry and restrained molecular dynamics. Experimentally derived restraints including 292 interproton distances from nuclear Overhauser effect measurements, 16 hydrogen bond restraints together with four disulphide bridge restraints were used as input to calculations of distance geometry and restrained molecular dynamics. Also included in the calculations were 36 phi and 17 chi 1 torsion angles obtained from 33JHN alpha and 3J alpha beta coupling constants in double quantum filtered COSY and primitive exclusive COSY experiments, respectively. The overall protein fold is similar to crambin and purothionin-alpha 1. Two alpha-helices running in opposite directions are found on the basis of 3JHN alpha and 3J alpha beta and deuterium exchange rates for backbone NH protons, and encompass residues 7-18 and 22-28. These two helices are connected by a turn and form a 'helix-turn-helix' motif. A short stretch of an anti-parallel beta-sheet exists between residues 1-4 and 31-34. the two protein termini of hordothionin-alpha are 'well-anchored'; the N-terminus of the protein is immobilized by this short beta-sheet whereas the C-terminus is 'pasted' to the carbonyl group of Cys-4 by a very stable hydrogen bond. The average root-mean-square differences for the backbone and heavy atoms after the restrained molecular dynamics calculations are 0.62 and 1.16 A respectively. These numbers represent a significant improvement over the corresponding values for the previous NMR structures of other thionins. The distance violation from the experimental interproton distances for the final structures is 0.14 for all atoms.
Project description:The elastic network model (ENM) is a widely used method to study native protein dynamics by normal mode analysis (NMA). In ENM we need information about all pairwise distances, and the distance between contacting atoms is restrained to the native value. Therefore ENM requires O(N2) information to realize its dynamics for a protein consisting of N amino acid residues. To see if (or to what extent) such a large amount of specific structural information is required to realize native protein dynamics, here we introduce a novel model based on only O(N) restraints. This model, named the 'contact number diffusion' model (CND), includes specific distance restraints for only local (along the amino acid sequence) atom pairs, and semi-specific non-local restraints imposed on each atom, rather than atom pairs. The semi-specific non-local restraints are defined in terms of the non-local contact numbers of atoms. The CND model exhibits the dynamic characteristics comparable to ENM and more correlated with the explicit-solvent molecular dynamics simulation than ENM. Moreover, unrealistic surface fluctuations often observed in ENM were suppressed in CND. On the other hand, in some ligand-bound structures CND showed larger fluctuations of buried protein atoms interacting with the ligand compared to ENM. In addition, fluctuations from CND and ENM show comparable correlations with the experimental B-factor. Although there are some indications of the importance of some specific non-local interactions, the semi-specific non-local interactions are mostly sufficient for reproducing the native protein dynamics.
Project description:Molecular dynamics (MD) simulation is a well-established tool for the computational study of protein structure and dynamics, but its application to the important problem of protein structure prediction remains challenging, in part because extremely long timescales can be required to reach the native structure. Here, we examine the extent to which the use of low-resolution information in the form of residue-residue contacts, which can often be inferred from bioinformatics or experimental studies, can accelerate the determination of protein structure in simulation. We incorporated sets of 62, 31, or 15 contact-based restraints in MD simulations of ubiquitin, a benchmark system known to fold to the native state on the millisecond timescale in unrestrained simulations. One-third of the restrained simulations folded to the native state within a few tens of microseconds-a speedup of over an order of magnitude compared with unrestrained simulations and a demonstration of the potential for limited amounts of structural information to accelerate structure determination. Almost all of the remaining ubiquitin simulations reached near-native conformations within a few tens of microseconds, but remained trapped there, apparently due to the restraints. We discuss potential methodological improvements that would facilitate escape from these near-native traps and allow more simulations to quickly reach the native state. Finally, using a target from the Critical Assessment of protein Structure Prediction (CASP) experiment, we show that distance restraints can improve simulation accuracy: In our simulations, restraints stabilized the native state of the protein, enabling a reasonable structural model to be inferred.
Project description:Structures of horse liver alcohol dehydrogenase complexed with NAD(+) and unreactive substrate analogues, 2,2,2-trifluoroethanol or 2,3,4,5,6-pentafluorobenzyl alcohol, were determined at 100 K at 1.12 or 1.14 Å resolution, providing estimates of atomic positions with overall errors of ~0.02 Å, the geometry of ligand binding, descriptions of alternative conformations of amino acid residues and waters, and evidence of a strained nicotinamide ring. The four independent subunits from the two homodimeric structures differ only slightly in the peptide backbone conformation. Alternative conformations for amino acid side chains were identified for 50 of the 748 residues in each complex, and Leu-57 and Leu-116 adopt different conformations to accommodate the different alcohols at the active site. Each fluoroalcohol occupies one position, and the fluorines of the alcohols are well-resolved. These structures closely resemble the expected Michaelis complexes with the pro-R hydrogens of the methylene carbons of the alcohols directed toward the re face of C4N of the nicotinamide rings with a C-C distance of 3.40 Å. The oxygens of the alcohols are ligated to the catalytic zinc at a distance expected for a zinc alkoxide (1.96 Å) and participate in a low-barrier hydrogen bond (2.52 Å) with the hydroxyl group of Ser-48 in a proton relay system. As determined by X-ray refinement with no restraints on bond distances and planarity, the nicotinamide rings in the two complexes are slightly puckered (quasi-boat conformation, with torsion angles of 5.9° for C4N and 4.8° for N1N relative to the plane of the other atoms) and have bond distances that are somewhat different compared to those found for NAD(P)(+). It appears that the nicotinamide ring is strained toward the transition state on the path to alcohol oxidation.
Project description:Crosslinking mass spectrometry (XL-MS) is becoming an increasingly popular technique for modeling protein monomers and complexes. The distance restraints garnered from these experiments can be used alone or as part of an integrative modeling approach, incorporating data from many sources. However, modeling practices are varied and the difference in their usefulness is not clear. Here, we develop a new scoring procedure for models based on crosslink data-Matched and Nonaccessible Crosslink score (MNXL). We compare its performance with that of other commonly-used scoring functions (Number of Violations and Sum of Violation Distances) on a benchmark of 14 protein domains, each with 300 corresponding models (at various levels of quality) and associated, previously published, experimental crosslinks (XLdb). The distances between crosslinked lysines are calculated either as Euclidean distances or Solvent Accessible Surface Distances (SASD) using a newly-developed method (Jwalk). MNXL takes into account whether a crosslink is nonaccessible, i.e. an experimentally observed crosslink has no corresponding SASD in a model due to buried lysines. This metric alone is shown to have a significant impact on modeling performance and is a concept that is not considered at present if only Euclidean distances are used. Additionally, a comparison between modeling with SASD or Euclidean distance shows that SASD is superior, even when factoring out the effect of the nonaccessible crosslinks. Our benchmarking also shows that MNXL outperforms the other tested scoring functions in terms of precision and correlation to C?-RMSD from the crystal structure. We finally test the MNXL at different levels of crosslink recovery (i.e. the percentage of crosslinks experimentally observed out of all theoretical ones) and set a target recovery of ?20% after which the performance plateaus.
Project description:Integral membrane proteins pose a major challenge for protein-structure prediction because only approximately 100 high-resolution structures are available currently, thereby impeding the development of rules or empirical potentials to predict the packing of transmembrane alpha-helices. However, when an intermediate-resolution electron microscopy (EM) map is available, it can be used to provide restraints which, in combination with a suitable computational protocol, make structure prediction feasible. In this work we present such a protocol, which proceeds in three stages: 1), generation of an ensemble of alpha-helices by flexible fitting into each of the density rods in the low-resolution EM map, spanning a range of rotational angles around the main helical axes and translational shifts along the density rods; 2), fast optimization of side chains and scoring of the resulting conformations; and 3), refinement of the lowest-scoring conformations with internal coordinate mechanics, by optimizing the van der Waals, electrostatics, hydrogen bonding, torsional, and solvation energy contributions. In addition, our method implements a penalty term through a so-called tethering map, derived from the EM map, which restrains the positions of the alpha-helices. The protocol was validated on three test cases: GpA, KcsA, and MscL.
Project description:Symmetric homo-oligomers represent a majority of proteins, and determining their structures helps elucidate important biological processes, including ion transport, signal transduction, and transcriptional regulation. In order to account for the noise and sparsity in the distance restraints used in Nuclear Magnetic Resonance (NMR) structure determination of cyclic (C(n)) symmetric homo-oligomers, and the resulting uncertainty in the determined structures, we develop a Bayesian structural inference approach. In contrast to traditional NMR structure determination methods, which identify a small set of low-energy conformations, the inferential approach characterizes the entire posterior distribution of conformations. Unfortunately, traditional stochastic techniques for inference may under-sample the rugged landscape of the posterior, missing important contributions from high-quality individual conformations and not accounting for the possible aggregate effects on inferred quantities from numerous unsampled conformations. However, by exploiting the geometry of symmetric homo-oligomers, we develop an algorithm that provides provable guarantees for the posterior distribution and the inferred mean atomic coordinates. Using experimental restraints for three proteins, we demonstrate that our approach is able to objectively characterize the structural diversity supported by the data. By simulating spurious and missing restraints, we further demonstrate that our approach is robust, degrading smoothly with noise and sparsity.