A New Secondary Structure Assignment Algorithm Using C? Backbone Fragments.
ABSTRACT: The assignment of secondary structure elements in proteins is a key step in the analysis of their structures and functions. We have developed an algorithm, SACF (secondary structure assignment based on C? fragments), for secondary structure element (SSE) assignment based on the alignment of C? backbone fragments with central poses derived by clustering known SSE fragments. The assignment algorithm consists of three steps: First, the outlier fragments on known SSEs are detected. Next, the remaining fragments are clustered to obtain the central fragments for each cluster. Finally, the central fragments are used as a template to make assignments. Following a large-scale comparison of 11 secondary structure assignment methods, SACF, KAKSI and PROSS are found to have similar agreement with DSSP, while PCASSO agrees with DSSP best. SACF and PCASSO show preference to reducing residues in N and C cap regions, whereas KAKSI, P-SEA and SEGNO tend to add residues to the terminals when DSSP assignment is taken as standard. Moreover, our algorithm is able to assign subtle helices (310-helix, ?-helix and left-handed helix) and make uniform assignments, as well as to detect rare SSEs in ?-sheets or long helices as outlier fragments from other programs. The structural uniformity should be useful for protein structure classification and prediction, while outlier fragments underlie the structure-function relationship.
Project description:The majority of residues in protein structures are involved in the formation of alpha-helices and beta-strands. These distinctive secondary structure patterns can be used to represent a protein for visual inspection and in vector-based protein structure comparison. Success of such structural comparison methods depends crucially on the accurate identification and delineation of secondary structure elements.We have developed a method PALSSE (Predictive Assignment of Linear Secondary Structure Elements) that delineates secondary structure elements (SSEs) from protein Calpha coordinates and specifically addresses the requirements of vector-based protein similarity searches. Our program identifies two types of secondary structures: helix and beta-strand, typically those that can be well approximated by vectors. In contrast to traditional secondary structure algorithms, which identify a secondary structure state for every residue in a protein chain, our program attributes residues to linear SSEs. Consecutive elements may overlap, thus allowing residues located at the overlapping region to have more than one secondary structure type.PALSSE is predictive in nature and can assign about 80% of the protein chain to SSEs as compared to 53% by DSSP and 57% by P-SEA. Such a generous assignment ensures almost every residue is part of an element and is used in structural comparisons. Our results are in agreement with human judgment and DSSP. The method is robust to coordinate errors and can be used to define SSEs even in poorly refined and low-resolution structures. The program and results are available at http://prodata.swmed.edu/palsse/.
Project description:Proteins are often characterized in terms of their primary, secondary, tertiary, and quaternary structure. Algorithms such as define secondary structure of proteins (DSSP) can automatically assign protein secondary structure based on the backbone hydrogen-bonding pattern. However, the assignment of secondary structure elements (SSEs) becomes a challenge when only the C? coordinates are available. In this work, we present protein C-alpha secondary structure output (PCASSO), a fast and accurate program for assigning protein SSEs using only the C? positions. PCASSO achieves ?95% accuracy with respect to DSSP and takes ?0.1 s using a single processor to analyze a 1000 residue system with multiple chains. Our approach was compared with current state-of-the-art C?-based methods and was found to outperform all of them in both speed and accuracy. A practical application is also presented and discussed.
Project description:<h4>Background</h4>The formation of contacts among protein secondary structure elements (SSEs) is an important step in protein folding as it determines topology of protein tertiary structure; hence, inferring inter-SSE contacts is crucial to protein structure prediction. One of the existing strategies infers inter-SSE contacts directly from the predicted possibilities of inter-residue contacts without any preprocessing, and thus suffers from the excessive noises existing in the predicted inter-residue contacts. Another strategy defines SSEs based on protein secondary structure prediction first, and then judges whether each candidate SSE pair could form contact or not. However, it is difficult to accurately determine boundary of SSEs due to the errors in secondary structure prediction. The incorrectly-deduced SSEs definitely hinder subsequent prediction of the contacts among them.<h4>Results</h4>We here report an accurate approach to infer the inter-SSE contacts (thus called as ISSEC) using the deep object detection technique. The design of ISSEC is based on the observation that, in the inter-residue contact map, the contacting SSEs usually form rectangle regions with characteristic patterns. Therefore, ISSEC infers inter-SSE contacts through detecting such rectangle regions. Unlike the existing approach directly using the predicted probabilities of inter-residue contact, ISSEC applies the deep convolution technique to extract high-level features from the inter-residue contacts. More importantly, ISSEC does not rely on the pre-defined SSEs. Instead, ISSEC enumerates multiple candidate rectangle regions in the predicted inter-residue contact map, and for each region, ISSEC calculates a confidence score to measure whether it has characteristic patterns or not. ISSEC employs greedy strategy to select non-overlapping regions with high confidence score, and finally infers inter-SSE contacts according to these regions.<h4>Conclusions</h4>Comprehensive experimental results suggested that ISSEC outperformed the state-of-the-art approaches in predicting inter-SSE contacts. We further demonstrated the successful applications of ISSEC to improve prediction of both inter-residue contacts and tertiary structure as well.
Project description:BACKGROUND: Protein-structure alignment is a fundamental tool to study protein function, evolution and model building. In the last decade several methods for structure alignment were introduced, but most of them ignore that structurally similar proteins can share the same spatial arrangement of secondary structure elements (SSE) but differ in the underlying polypeptide chain connectivity (non-sequential SSE connectivity). RESULTS: We perform protein-structure alignment using a two-level hierarchical approach implemented in the program GANGSTA. On the first level, pair contacts and relative orientations between SSEs (i.e. alpha-helices and beta-strands) are maximized with a genetic algorithm (GA). On the second level residue pair contacts from the best SSE alignments are optimized. We have tested the method on visually optimized structure alignments of protein pairs (pairwise mode) and for database scans. For a given protein structure, our method is able to detect significant structural similarity of functionally important folds with non-sequential SSE connectivity. The performance for structure alignments with strictly sequential SSE connectivity is comparable to that of other structure alignment methods. CONCLUSION: As demonstrated for several applications, GANGSTA finds meaningful protein-structure alignments independent of the SSE connectivity. GANGSTA is able to detect structural similarity of protein folds that are assigned to different superfamilies but nevertheless possess similar structures and perform related functions, even if these proteins differ in SSE connectivity.
Project description:BACKGROUND: Secondary structures are elements of great importance in structural biology, biochemistry and bioinformatics. They are broadly composed of two repetitive structures namely ?-helices and ?-sheets, apart from turns, and the rest is associated to coil. These repetitive secondary structures have specific and conserved biophysical and geometric properties. PolyProline II (PPII) helix is yet another interesting repetitive structure which is less frequent and not usually associated with stabilizing interactions. Recent studies have shown that PPII frequency is higher than expected, and they could have an important role in protein-protein interactions. METHODOLOGY/PRINCIPAL FINDINGS: A major factor that limits the study of PPII is that its assignment cannot be carried out with the most commonly used secondary structure assignment methods (SSAMs). The purpose of this work is to propose a PPII assignment methodology that can be defined in the frame of DSSP secondary structure assignment. Considering the ambiguity in PPII assignments by different methods, a consensus assignment strategy was utilized. To define the most consensual rule of PPII assignment, three SSAMs that can assign PPII, were compared and analyzed. The assignment rule was defined to have a maximum coverage of all assignments made by these SSAMs. Not many constraints were added to the assignment and only PPII helices of at least 2 residues length are defined. CONCLUSIONS/SIGNIFICANCE: The simple rules designed in this study for characterizing PPII conformation, lead to the assignment of 5% of all amino as PPII. Sequence-structure relationships associated with PPII, defined by the different SSAMs, underline few striking differences. A specific study of amino acid preferences in their N and C-cap regions was carried out as their solvent accessibility and contact patterns. Thus the assignment of PPII can be coupled with DSSP and thus opens a simple way for further analysis in this field.
Project description:The DSSP program automatically assigns the secondary structure for each residue from the three-dimensional co-ordinates of a protein structure to one of eight states. However, discrete assignments are incomplete in that they cannot capture the continuum of thermal fluctuations. Therefore, DSSPcont (http://cubic.bioc.columbia.edu/services/DSSPcont) introduces a continuous assignment of secondary structure that replaces 'static' by 'dynamic' states. Technically, the continuum results from calculating weighted averages over 10 discrete DSSP assignments with different hydrogen bond thresholds. A DSSPcont assignment for a particular residue is a percentage likelihood of eight secondary structure states, derived from a weighted average of the ten DSSP assignments. The continuous assignments have two important features: (i) they reflect the structural variations due to thermal fluctuations as detected by NMR spectroscopy; and (ii) they reproduce the structural variation between many NMR models from one single model. Therefore, functionally important variation can be extracted from a single X-ray structure using the continuous assignment procedure.
Project description:Recently recognized slow slip events (SSEs) recurring in the deeper extensions of seismogenic zones along plate boundaries are drawing attention to their potential for triggering megathrust earthquakes that rupture the entire seismogenic zone. We describe how earthquakes simulated in a single-degree-of-freedom model are synchronized to the rhythm of imposed periodic SSEs. The time lag tQ from the one most recent SSE to a seismic event varies with system parameters and may take a broad range of eligible values between 0 and TSSE (SSE recurrence period). Earthquakes were found to synchronize with SSEs in various patterns depending on the proportion of SSE-driven loading within an SSE cycle, the recurrence period of the SSEs, and the duration of the SSEs, although synchronization itself remained a prevalent feature. Asynchrony was found only for long SSE durations.
Project description:Assessing structural similarity and defining common regions through comparison of protein spatial structures is an important task in functional and evolutionary studies of proteins. There are many servers that compare structures and define sub-structures in common between proteins through superposition and closeness of either coordinates or contacts. However, a natural way to analyze a structure for experts working on structure classification is to look for specific three-dimensional (3D) motifs and patterns instead of finding common features in two proteins. Such motifs can be described by the architecture and topology of major secondary structural elements (SSEs) without consideration of subtle differences in 3D coordinates. Despite the importance of motif-based structure searches, currently there is a shortage of servers to perform this task. Widely known TOPS does not fully address this problem, as it finds only topological match but does not take into account other important spatial properties, such as interactions and chirality. Here, we implemented our approach to protein structure pattern search (ProSMoS) as a web-server. ProSMoS converts 3D structure into an interaction matrix representation including the SSE types, handednesses of connections between SSEs, coordinates of SSE starts and ends, types of interactions between SSEs and beta-sheet definitions. For a user-defined structure pattern, ProSMoS lists all structures from a database that contain this pattern. ProSMoS server will be of interest to structural biologists who would like to analyze very general and distant structural similarities. The ProSMoS web server is available at: http://prodata.swmed.edu/ProSMoS/.
Project description:It has been known that topologically different proteins of the same class sometimes share the same spatial arrangement of secondary structure elements (SSEs). However, the frequency by which topologically different structures share the same spatial arrangement of SSEs is unclear. It is important to estimate this frequency because it provides both a deeper understanding of the geometry of protein folds and a valuable suggestion for predicting protein structures with novel folds. Here we clarified the frequency with which protein folds share the same SSE packing arrangement with other folds, the types of spatial arrangement of SSEs that are frequently observed across different folds, and the diversity of protein folds that share the same spatial arrangement of SSEs with a given fold, using a protein structure alignment program MICAN, which we have been developing. By performing comprehensive structural comparison of SCOP fold representatives, we found that approximately 80% of protein folds share the same spatial arrangement of SSEs with other folds. We also observed that many protein pairs that share the same spatial arrangement of SSEs belong to the different classes, often with an opposing N- to C-terminal direction of the polypeptide chain. The most frequently observed spatial arrangement of SSEs was the 2-layer ?/? packing arrangement and it was dispersed among as many as 27% of SCOP fold representatives. These results suggest that the same spatial arrangements of SSEs are adopted by a wide variety of different folds and that the spatial arrangement of SSEs is highly robust against the N- to C-terminal direction of the polypeptide chain.
Project description:Current ceramic solid-state electrolyte (SSE) films have low ionic conductivities (10-8 to 10-5 S/cm ), attributed to the amorphous structure or volatile Li loss. Herein, we report a solution-based printing process followed by rapid (~3 s) high-temperature (~1500°C) reactive sintering for the fabrication of high-performance ceramic SSE films. The SSEs exhibit a dense, uniform structure and a superior ionic conductivity of up to 1 mS/cm. Furthermore, the fabrication time from precursor to final product is typically ~5 min, 10 to 100 times faster than conventional SSE syntheses. This printing and rapid sintering process also allows the layer-by-layer fabrication of multilayer structures without cross-contamination. As a proof of concept, we demonstrate a printed solid-state battery with conformal interfaces and excellent cycling stability. Our technique can be readily extended to other thin-film SSEs, which open previously unexplores opportunities in developing safe, high-performance solid-state batteries and other thin-film devices.