SAFlex: A structural alphabet extension to integrate protein structural flexibility and missing data information.
ABSTRACT: In this paper, we describe SAFlex (Structural Alphabet Flexibility), an extension of an existing structural alphabet (HMM-SA), to better explore increasing protein three dimensional structure information by encoding conformations of proteins in case of missing residues or uncertainties. An SA aims to reduce three dimensional conformations of proteins as well as their analysis and comparison complexity by simplifying any conformation in a series of structural letters. Our methodology presents several novelties. Firstly, it can account for the encoding uncertainty by providing a wide range of encoding options: the maximum a posteriori, the marginal posterior distribution, and the effective number of letters at each given position. Secondly, our new algorithm deals with the missing data in the protein structure files (concerning more than 75% of the proteins from the Protein Data Bank) in a rigorous probabilistic framework. Thirdly, SAFlex is able to encode and to build a consensus encoding from different replicates of a single protein such as several homomer chains. This allows localizing structural differences between different chains and detecting structural variability, which is essential for protein flexibility identification. These improvements are illustrated on different proteins, such as the crystal structure of an eukaryotic small heat shock protein. They are promising to explore increasing protein redundancy data and obtain useful quantification of their flexibility.
Project description:BACKGROUND: Protein deformation has been extensively analysed through global methods based on RMSD, torsion angles and Principal Components Analysis calculations. Here we use a local approach, able to distinguish among the different backbone conformations within loops, ?-helices and ?-strands, to address the question of secondary structures' shape variation within proteins and deformation at interface upon complexation. RESULTS: Using a structural alphabet, we translated the 3 D structures of large sets of protein-protein complexes into sequences of structural letters. The shape of the secondary structures can be assessed by the structural letters that modeled them in the structural sequences. The distribution analysis of the structural letters in the three protein compartments (surface, core and interface) reveals that secondary structures tend to adopt preferential conformations that differ among the compartments. The local description of secondary structures highlights that curved conformations are preferred on the surface while straight ones are preferred in the core. Interfaces display a mixture of local conformations either preferred in core or surface. The analysis of the structural letters transition occurring between protein-bound and unbound conformations shows that the deformation of secondary structure is tightly linked to the compartment preference of the local conformations. CONCLUSION: The conformation of secondary structures can be further analysed and detailed thanks to a structural alphabet which allows a better description of protein surface, core and interface in terms of secondary structures' shape and deformation. Induced-fit modification tendencies described here should be valuable information to identify and characterize regions under strong structural constraints for functional reasons.
Project description:iPARTS is an improved web server for aligning two RNA 3D structures based on a structural alphabet (SA)-based approach. In particular, we first derive a Ramachandran-like diagram of RNAs by plotting nucleotides on a 2D axis using their two pseudo-torsion angles eta and . Next, we apply the affinity propagation clustering algorithm to this eta- plot to obtain an SA of 23-nt conformations. We finally use this SA to transform RNA 3D structures into 1D sequences of SA letters and continue to utilize classical sequence alignment methods to compare these 1D SA-encoded sequences and determine their structural similarities. iPARTS takes as input two RNA 3D structures in the PDB format and outputs their global alignment (for determining overall structural similarity), semiglobal alignments (for detecting structural motifs or substructures), local alignments (for finding locally similar substructures) and normalized local structural alignments (for identifying more similar local substructures without non-similar internal fragments), with graphical display that allows the user to visually view, rotate and enlarge the superposition of aligned RNA 3D structures. iPARTS is now available online at http://bioalgorithm.life.nctu.edu.tw/iPARTS/.
Project description:Rational peptide design and large-scale prediction of peptide structure from sequence remain a challenge for chemical biologists. We present PEP-FOLD, an online service, aimed at de novo modelling of 3D conformations for peptides between 9 and 25 amino acids in aqueous solution. Using a hidden Markov model-derived structural alphabet (SA) of 27 four-residue letters, PEP-FOLD first predicts the SA letter profiles from the amino acid sequence and then assembles the predicted fragments by a greedy procedure driven by a modified version of the OPEP coarse-grained force field. Starting from an amino acid sequence, PEP-FOLD performs series of 50 simulations and returns the most representative conformations identified in terms of energy and population. Using a benchmark of 25 peptides with 9-23 amino acids, and considering the reproducibility of the runs, we find that, on average, PEP-FOLD locates lowest energy conformations differing by 2.6 A Calpha root mean square deviation from the full NMR structures. PEP-FOLD can be accessed at http://bioserv.rpbs.univ-paris-diderot.fr/PEP-FOLD.
Project description:SARSA is a web tool that can be used to align two or more RNA tertiary structures. The basic idea behind SARSA is that we use the vector quantization approach to derive a structural alphabet (SA) of 23 nucleotide conformations, via which we transform RNA 3D structures into 1D sequences of SA letters and then utilize classical sequence alignment methods to compare these 1D SA-encoded sequences and determine their structural similarities. In SARSA, we provide two RNA structural alignment tools, PARTS for pairwise alignment of RNA tertiary structures and MARTS for multiple alignment of RNA tertiary structures. Particularly in PARTS, we have implemented four kinds of pairwise alignments for a variety of practical applications: (i) global alignment for comparing whole structural similarity, (ii) semiglobal alignment for detecting structural motifs, (iii) local alignment for finding locally similar substructures and (iv) normalized local alignment for eliminating the mosaic effect of local alignment. Both tools in SARSA take as input RNA 3D structures in the PDB format and in their outputs provide graphical display that allows the user to visually view, rotate and enlarge the superposition of aligned RNA molecules. SARSA is available online at http://bioalgorithm.life.nctu.edu.tw/SARSA/.
Project description:BACKGROUND: In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. RESULTS: In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. CONCLUSION: Our study provides qualitative information about induced fit. These results could be of help for flexible docking.
Project description:BACKGROUND: The hierarchical and partially redundant nature of protein structures justifies the definition of frequently occurring conformations of short fragments as 'states'. Collections of selected representatives for these states define Structural Alphabets, describing the most typical local conformations within protein structures. These alphabets form a bridge between the string-oriented methods of sequence analysis and the coordinate-oriented methods of protein structure analysis. RESULTS: A Structural Alphabet has been derived by clustering all four-residue fragments of a high-resolution subset of the protein data bank and extracting the high-density states as representative conformational states. Each fragment is uniquely defined by a set of three independent angles corresponding to its degrees of freedom, capturing in simple and intuitive terms the properties of the conformational space. The fragments of the Structural Alphabet are equivalent to the conformational attractors and therefore yield a most informative encoding of proteins. Proteins can be reconstructed within the experimental uncertainty in structure determination and ensembles of structures can be encoded with accuracy and robustness. CONCLUSIONS: The density-based Structural Alphabet provides a novel tool to describe local conformations and it is specifically suitable for application in studies of protein dynamics.
Project description:We analyzed the structural behavior of DNA complexed with regulatory proteins and the nucleosome core particle (NCP). The three-dimensional structures of almost 25 thousand dinucleotide steps from more than 500 sequentially non-redundant crystal structures were classified by using DNA structural alphabet CANA (Conformational Alphabet of Nucleic Acids) and associations between ten CANA letters and sixteen dinucleotide sequences were investigated. The associations showed features discriminating between specific and non-specific binding of DNA to proteins. Important is the specific role of two DNA structural forms, A-DNA, and BII-DNA, represented by the CANA letters AAA and BB2: AAA structures are avoided in non-specific NCP complexes, where the wrapping of the DNA duplex is explained by the periodic occurrence of BB2 every 10.3 steps. In both regulatory and NCP complexes, the extent of bending of the DNA local helical axis does not influence proportional representation of the CANA alphabet letters, namely the relative incidences of AAA and BB2 remain constant in bent and straight duplexes.
Project description:Since its first release in 2010, iPARTS has become a valuable tool for globally or locally aligning two RNA 3D structures. It was implemented by a structural alphabet (SA)-based approach, which uses an SA of 23 letters to reduce RNA 3D structures into 1D sequences of SA letters and applies traditional sequence alignment to these SA-encoded sequences for determining their global or local similarity. In this version, we have re-implemented iPARTS into a new web server iPARTS2 by constructing a totally new SA, which consists of 92 elements with each carrying both information of base and backbone geometry for a representative nucleotide. This SA is significantly different from the one used in iPARTS, because the latter consists of only 23 elements with each carrying only the backbone geometry information of a representative nucleotide. Our experimental results have shown that iPARTS2 outperforms its previous version iPARTS and also achieves better accuracy than other popular tools, such as SARA, SETTER and RASS, in RNA alignment quality and function prediction. iPARTS2 takes as input two RNA 3D structures in the PDB format and outputs their global or local alignments with graphical display. iPARTS2 is now available online at http://genome.cs.nthu.edu.tw/iPARTS2/.
Project description:Intrinsic Disorder Proteins (IDPs) have become a hot topic since their characterisation in the 90s. The data presented in this article are related to our research entitled "A structural entropy index to analyse local conformations in Intrinsically Disordered Proteins" published in Journal of Structural Biology . In this study, we quantified, for the first time, continuum from rigidity to flexibility and finally disorder. Non-disordered regions were also highlighted in the ensemble of disordered proteins. This work was done using the Protein Ensemble Database (PED), which is a useful database collecting series of protein structures considered as IDPs. The data set consists of a collection of cleaned protein files in classical pdb format that can be readily used as an input with most automatic analysis software. The accompanying data include the coding of all structural information in terms of a structural alphabet, namely Protein Blocks (PBs). An entropy index derived from PBs that allows apprehending the continuum between protein rigidity to flexibility to disorder is included, with information from secondary structure assignment, protein accessibility and prediction of disorder from the sequences. The data may be used for further structural bioinformatics studies of IDPs. It can also be used as a benchmark for evaluating disorder prediction methods.
Project description:Isolating the properties of proteins that allow them to convert sequence into the structure is a long-lasting biophysical problem. In particular, studies focused extensively on the effect of a reduced alphabet size on the folding properties. However, the natural alphabet is a compromise between versatility and optimisation of the available resources. Here, for the first time, we include the impact of the relative availability of the amino acids to extract from the 20 letters the core necessary for protein stability. We present a computational protein design scheme that involves the competition for resources between a protein and a potential interaction partner that, additionally, gives us the chance to investigate the effect of the reduced alphabet on protein-protein interactions. We devise a scheme that automatically identifies the optimal reduced set of letters for the design of the protein, and we observe that even alphabets reduced down to 4 letters allow for single protein folding. However, it is only with 6 letters that we achieve optimal folding, thus recovering experimental observations. Additionally, we notice that the binding between the protein and a potential interaction partner could not be avoided with the investigated reduced alphabets. Therefore, we suggest that aggregation could have been a driving force in the evolution of the large protein alphabet.