DOMIRE: a web server for identifying structural domains and their neighbors in proteins.
ABSTRACT: The DOMIRE web server implements a novel, automatic, protein structural domain assignment procedure based on 3D substructures of the query protein which are also found within structures of a non-redundant protein database. These common 3D substructures are transformed into a co-occurrence matrix that offers a global view of the protein domain organization. Three different algorithms are employed to define structural domain boundaries from this co-occurrence matrix. For each query, a list of structural neighbors and their alignments are provided. DOMIRE, by displaying the protein structural domain organization, can be a useful tool for defining protein common cores and for unravelling the evolutionary relationship between different proteins.http://firstname.lastname@example.org.
Project description:R3D-BLAST is a BLAST-like search tool that allows the user to quickly and accurately search against the PDB for RNA structures sharing similar substructures with a specified query RNA structure. The basic idea behind R3D-BLAST is that all the RNA 3D structures deposited in the PDB are first encoded as 1D structural sequences using a structural alphabet of 23 distinct nucleotide conformations, and BLAST is then applied to these 1D structural sequences to search for those RNA substructures whose 1D structural sequences are similar to that of the query RNA substructure. R3D-BLAST takes as input an RNA 3D structure in the PDB format and outputs all substructures of the hits similar to that of the query with a graphical display to show their structural superposition. In addition, each RNA substructure hit found by R3D-BLAST has an associated E-value to measure its statistical significance. R3D-BLAST is now available online at http://genome.cs.nthu.edu.tw/R3D-BLAST/ for public access.
Project description:Fast identification of protein structures that are similar to a specified query structure in the entire Protein Data Bank (PDB) is fundamental in structure and function prediction. We present FragBag: An ultrafast and accurate method for comparing protein structures. We describe a protein structure by the collection of its overlapping short contiguous backbone segments, and discretize this set using a library of fragments. Then, we succinctly represent the protein as a "bags-of-fragments"-a vector that counts the number of occurrences of each fragment-and measure the similarity between two structures by the similarity between their vectors. Our representation has two additional benefits: (i) it can be used to construct an inverted index, for implementing a fast structural search engine of the entire PDB, and (ii) one can specify a structure as a collection of substructures, without combining them into a single structure; this is valuable for structure prediction, when there are reliable predictions only of parts of the protein. We use receiver operating characteristic curve analysis to quantify the success of FragBag in identifying neighbor candidate sets in a dataset of over 2,900 structures. The gold standard is the set of neighbors found by six state of the art structural aligners. Our best FragBag library finds more accurate candidate sets than the three other filter methods: The SGM, PRIDE, and a method by Zotenko et al. More interestingly, FragBag performs on a par with the computationally expensive, yet highly trusted structural aligners STRUCTAL and CE.
Project description:To address many challenges in RNA structure/function prediction, the characterization of RNA's modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D-a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool-designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally described in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding.
Project description:Comparing, classifying and modelling protein structural interactions can enrich our understanding of many biomolecular processes. This contribution describes Kbdock (http://kbdock.loria.fr/), a database system that combines the Pfam domain classification with coordinate data from the PDB to analyse and model 3D domain-domain interactions (DDIs). Kbdock can be queried using Pfam domain identifiers, protein sequences or 3D protein structures. For a given query domain or pair of domains, Kbdock retrieves and displays a non-redundant list of homologous DDIs or domain-peptide interactions in a common coordinate frame. Kbdock may also be used to search for and visualize interactions involving different, but structurally similar, Pfam families. Thus, structural DDI templates may be proposed even when there is little or no sequence similarity to the query domains.
Project description:Identification of protein structural neighbors to a query is fundamental in structure and function prediction. Here we present BS-align, a systematic method to retrieve backbone string neighbors from primary sequences as templates for protein modeling. The backbone conformation of a protein is represented by the backbone string, as defined in Ramachandran space. The backbone string of a query can be accurately predicted by two innovative technologies: a knowledge-driven sequence alignment and encoding of a backbone string element profile. Then, the predicted backbone string is employed to align against a backbone string database and retrieve a set of backbone string neighbors. The backbone string neighbors were shown to be close to native structures of query proteins. BS-align was successfully employed to predict models of 10 membrane proteins with lengths ranging between 229 and 595 residues, and whose high-resolution structural determinations were difficult to elucidate both by experiment and prediction. The obtained TM-scores and root mean square deviations of the models confirmed that the models based on the backbone string neighbors retrieved by the BS-align were very close to the native membrane structures although the query and the neighbor shared a very low sequence identity. The backbone string system represents a new road for the prediction of protein structure from sequence, and suggests that the similarity of the backbone string would be more informative than describing a protein as belonging to a fold.
Project description:BACKGROUND:RNA molecules have been known to play a variety of significant roles in cells. In principle, the functions of RNAs are largely determined by their three-dimensional (3D) structures. As more and more RNA 3D structures are available in the Protein Data Bank (PDB), a bioinformatics tool, which is able to rapidly and accurately search the PDB database for similar RNA 3D structures or substructures, is helpful to understand the structural and functional relationships of RNAs. RESULTS:Since its first release in 2011, R3D-BLAST has become a useful tool for searching the PDB database for similar RNA 3D structures and substructures. It was implemented by a structural-alphabet (SA)-based method, which utilizes an SA with 23 structural letters to encode RNA 3D structures into one-dimensional (1D) structural sequences and applies BLAST to the resulting structural sequences for searching similar substructures of RNAs. In this study, we have upgraded R3D-BLAST to develop a new web server named R3D-BLAST2 based on a higher quality SA newly constructed from a representative and sufficiently non-redundant list of RNA 3D structures. In addition, we have modified the kernel program in R3D-BLAST2 so that it can accept an RNA structure in the mmCIF format as an input. The results of our experiments on a benchmark dataset have demonstrated that R3D-BLAST2 indeed performs very well in comparison to its earlier version R3D-BLAST and other similar tools RNA FRABASE, FASTR3D and RAG-3D by searching a larger number of RNA 3D substructures resembling those of the input RNA. CONCLUSIONS:R3D-BLAST2 is a valuable BLAST-like search tool that can more accurately scan the PDB database for similar RNA 3D substructures. It is publicly available at http://genome.cs.nthu.edu.tw/R3D-BLAST2/ .
Project description:Similarities in the 3D patterns of RNA base interactions or arrangements can provide insights into their functions and roles in stabilization of the RNA 3D structure. Nucleic Acids Search for Substructures and Motifs (NASSAM) is a graph theoretical program that can search for 3D patterns of base arrangements by representing the bases as pseudo-atoms. The geometric relationship of the pseudo-atoms to each other as a pattern can be represented as a labeled graph where the pseudo-atoms are the graph's nodes while the edges are the inter-pseudo-atomic distances. The input files for NASSAM are PDB formatted 3D coordinates. This web server can be used to identify matches of base arrangement patterns in a query structure to annotated patterns that have been reported in the literature or that have possible functional and structural stabilization implications. The NASSAM program is freely accessible without any login requirement at http://mfrlab.org/grafss/nassam/.
Project description:3dLOGO is a web server for the identification and analysis of conserved protein 3D substructures. Given a set of residues in a PDB (Protein Data Bank) chain, the server detects the matching substructure(s) in a set of user-provided protein structures, generates a multiple structure alignment centered on the input substructures and highlights other residues whose structural conservation becomes evident after the defined superposition. Conserved residues are proposed to the user for highlighting functional areas, deriving refined structural motifs or building sequence patterns. Residue structural conservation can be visualized through an expressly designed Java application, 3dProLogo, which is a 3D implementation of a sequence logo. The 3dLOGO server, with related documentation, is available at http://3dlogo.uniroma2.it/
Project description:MOTIVATION: Exploitation of locally similar 3D patterns of physicochemical properties on the surface of a protein for detection of binding sites that may lack sequence and global structural conservation. RESULTS: An algorithm, ProBiS is described that detects structurally similar sites on protein surfaces by local surface structure alignment. It compares the query protein to members of a database of protein 3D structures and detects with sub-residue precision, structurally similar sites as patterns of physicochemical properties on the protein surface. Using an efficient maximum clique algorithm, the program identifies proteins that share local structural similarities with the query protein and generates structure-based alignments of these proteins with the query. Structural similarity scores are calculated for the query protein's surface residues, and are expressed as different colors on the query protein surface. The algorithm has been used successfully for the detection of protein-protein, protein-small ligand and protein-DNA binding sites. AVAILABILITY: The software is available, as a web tool, free of charge for academic users at http://probis.cmm.ki.si.
Project description:The fastSCOP is a web server that rapidly identifies the structural domains and determines the evolutionary superfamilies of a query protein structure. This server uses 3D-BLAST to scan quickly a large structural classification database (SCOP1.71 with <95% identity with each other) and the top 10 hit domains, which have different superfamily classifications, are obtained from the hit lists. MAMMOTH, a detailed structural alignment tool, is adopted to align these top 10 structures to refine domain boundaries and to identify evolutionary superfamilies. Our previous works demonstrated that 3D-BLAST is as fast as BLAST, and has the characteristics of BLAST (e.g. a robust statistical basis, effective search and reliable database search capabilities) in large structural database searches based on a structural alphabet database and a structural alphabet substitution matrix. The classification accuracy of this server is approximately 98% for 586 query structures and the average execution time is approximately 5. This server was also evaluated on 8700 structures, which have no annotations in the SCOP; the server can automatically assign 7311 (84%) proteins (9420 domains) to the SCOP superfamilies in 9.6 h. These results suggest that the fastSCOP is robust and can be a useful server for recognizing the evolutionary classifications and the protein functions of novel structures. The server is accessible at http://fastSCOP.life.nctu.edu.tw.