An automatic method for CASP9 free modeling structure prediction assessment.
ABSTRACT: Manual inspection has been applied to and is well accepted for assessing critical assessment of protein structure prediction (CASP) free modeling (FM) category predictions over the years. Such manual assessment requires expertise and significant time investment, yet has the problems of being subjective and unable to differentiate models of similar quality. It is beneficial to incorporate the ideas behind manual inspection to an automatic score system, which could provide objective and reproducible assessment of structure models.Inspired by our experience in CASP9 FM category assessment, we developed an automatic superimposition independent method named Quality Control Score (QCS) for structure prediction assessment. QCS captures both global and local structural features, with emphasis on global topology. We applied this method to all FM targets from CASP9, and overall the results showed the best agreement with Manual Inspection Scores among automatic prediction assessment methods previously applied in CASPs, such as Global Distance Test Total Score (GDT_TS) and Contact Score (CS). As one of the important components to guide our assessment of CASP9 FM category predictions, this method correlates well with other scoring methods and yet is able to reveal good-quality models that are missed by GDT_TS.The script for QCS calculation is available at http://prodata.swmed.edu/QCSemail@example.comSupplementary data are available at Bioinformatics online.
Project description:We present an overview of the ninth round of Critical Assessment of Protein Structure Prediction (CASP9) "Template free modeling" category (FM). Prediction models were evaluated using a combination of established structural and sequence comparison measures and a novel automated method designed to mimic manual inspection by capturing both global and local structural features. These scores were compared to those assigned manually over a diverse subset of target domains. Scores were combined to compare overall performance of participating groups and to estimate rank significance. Moreover, we discuss a few examples of free modeling targets to highlight the progress and bottlenecks of current prediction methods. Notably, a server prediction model for a single target (T0581) improved significantly over the closest structure template (44% GDT increase). This accomplishment represents the "winner" of the CASP9 FM category. A number of human expert groups submitted slight variations of this model, highlighting a trend for human experts to act as "meta predictors" by correctly selecting among models produced by the top-performing automated servers. The details of evaluation are available at http://prodata.swmed.edu/CASP9/ .
Project description:We present the assessment of predictions for Template-Free Modeling in CASP10 and a report on the first ROLL experiment wherein predictions are collected year round for review at the regular CASP season. Models were first clustered so that duplicated or very similar ones were grouped together and represented by one model in the cluster. The representatives were then compared with targets using GDT_TS, QCS, and three additional superposition-independent score functions newly developed for CASP10. For each target, the top 15 representatives by each score were pooled to form the Top15Union set. All models in this set were visually inspected by four of us independently using the new plugin, EvalScore, which we developed with the UCSF Chimera group. The best models were selected for each target after extensive debate among the four examiners. Groups were ranked by the number of targets (hits) for which a group's model was selected as one of the best models. The Keasar group had most hits in both categories, with four of 19 FM and eight of 36 ROLL targets. The most successful prediction servers were QUARK from Zhang's group for FM category with three hits and Zhang-server for the ROLL category with seven hits. As observed in CASP9, many successful groups were not true "template-free" modelers but used remote templates and/or server models to obtain their winning models. The results of the first ROLL experiment were broadly similar to those of the CASP10 FM exercise.
Project description:BACKGROUND: The accurate prediction of ligand binding residues from amino acid sequences is important for the automated functional annotation of novel proteins. In the previous two CASP experiments, the most successful methods in the function prediction category were those which used structural superpositions of 3D models and related templates with bound ligands in order to identify putative contacting residues. However, whilst most of this prediction process can be automated, visual inspection and manual adjustments of parameters, such as the distance thresholds used for each target, have often been required to prevent over prediction. Here we describe a novel method FunFOLD, which uses an automatic approach for cluster identification and residue selection. The software provided can easily be integrated into existing fold recognition servers, requiring only a 3D model and list of templates as inputs. A simple web interface is also provided allowing access to non-expert users. The method has been benchmarked against the top servers and manual prediction groups tested at both CASP8 and CASP9. RESULTS: The FunFOLD method shows a significant improvement over the best available servers and is shown to be competitive with the top manual prediction groups that were tested at CASP8. The FunFOLD method is also competitive with both the top server and manual methods tested at CASP9. When tested using common subsets of targets, the predictions from FunFOLD are shown to achieve a significantly higher mean Matthews Correlation Coefficient (MCC) scores and Binding-site Distance Test (BDT) scores than all server methods that were tested at CASP8. Testing on the CASP9 set showed no statistically significant separation in performance between FunFOLD and the other top server groups tested. CONCLUSIONS: The FunFOLD software is freely available as both a standalone package and a prediction server, providing competitive ligand binding site residue predictions for expert and non-expert users alike. The software provides a new fully automated approach for structure based function prediction using 3D models of proteins.
Project description:The Critical assessment of protein structure prediction round 9 (CASP9) aimed to evaluate predictions for 129 experimentally determined protein structures. To assess tertiary structure predictions, these target structures were divided into domain-based evaluation units that were then classified into two assessment categories: template based modeling (TBM) and template free modeling (FM). CASP9 targets were split into domains of structurally compact evolutionary modules. For the targets with more than one defined domain, the decision to split structures into domains for evaluation was based on server performance. Target domains were categorized based on their evolutionary relatedness to existing templates as well as their difficulty levels indicated by server performance. Those target domains with sequence-related templates and high server prediction performance were classified as TMB, whereas those targets without identifiable templates and low server performance were classified as FM. However, using these generalizations for classification resulted in a blurred boundary between CASP9 assessment categories. Thus, the FM category included those domains without sequence detectable templates (25 target domains) as well as some domains with difficult to detect templates whose predictions were as poor as those without templates (five target domains). Several interesting examples are discussed, including targets with sequence related templates that exhibit unusual structural differences, targets with homologous or analogous structure templates that are not detectable by sequence, and targets with new folds.
Project description:The estimation of prediction quality is important because without quality measures, it is difficult to determine the usefulness of a prediction. Currently, methods for ligand binding site residue predictions are assessed in the function prediction category of the biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, utilizing the Matthews Correlation Coefficient (MCC) and Binding-site Distance Test (BDT) metrics. However, the assessment of ligand binding site predictions using such metrics requires the availability of solved structures with bound ligands. Thus, we have developed a ligand binding site quality assessment tool, FunFOLDQA, which utilizes protein feature analysis to predict ligand binding site quality prior to the experimental solution of the protein structures and their ligand interactions. The FunFOLDQA feature scores were combined using: simple linear combinations, multiple linear regression and a neural network. The neural network produced significantly better results for correlations to both the MCC and BDT scores, according to Kendall's ?, Spearman's ? and Pearson's r correlation coefficients, when tested on both the CASP8 and CASP9 datasets. The neural network also produced the largest Area Under the Curve score (AUC) when Receiver Operator Characteristic (ROC) analysis was undertaken for the CASP8 dataset. Furthermore, the FunFOLDQA algorithm incorporating the neural network, is shown to add value to FunFOLD, when both methods are employed in combination. This results in a statistically significant improvement over all of the best server methods, the FunFOLD method (6.43%), and one of the top manual groups (FN293) tested on the CASP8 dataset. The FunFOLDQA method was also found to be competitive with the top server methods when tested on the CASP9 dataset. To the best of our knowledge, FunFOLDQA is the first attempt to develop a method that can be used to assess ligand binding site prediction quality, in the absence of experimental data.
Project description:Interactions between proteins and their ligands play central roles in many physiological processes. The structural details for most of these interactions, however, have not yet been characterized experientially. Therefore, various computational tools have been developed to predict the location of binding sites and the amino acid residues interacting with ligands. In this manuscript, we assess the performance of 33 methods participating in the ligand-binding site prediction category in CASP9. The overall accuracy of ligand-binding site predictions in CASP9 appears rather high (average Matthews correlation coefficient of 0.62 for the 10 top performing groups) and compared to previous experiments more groups performed equally well. However, this should be seen in context of a strong bias in the test data toward easy template-based models. Overall, the top performing methods have converged to a similar approach using ligand-binding site inference from related homologous structures, which limits their applicability for difficult de novo prediction targets. Here, we present the results of the CASP9 assessment of the ligand-binding site category, discuss examples for successful and challenging prediction targets in CASP9, and finally suggest changes in the format of the experiment to overcome the current limitations of the assessment.
Project description:BACKGROUND:We investigated fully automatic coronary artery calcium (CAC) scoring and cardiovascular disease (CVD) risk categorization from CT attenuation correction (CTAC) acquired at rest and stress during cardiac PET/CT and compared it with manual annotations in CTAC and with dedicated calcium scoring CT (CSCT). METHODS AND RESULTS:We included 133 consecutive patients undergoing myocardial perfusion 82Rb PET/CT with the acquisition of low-dose CTAC at rest and stress. Additionally, a dedicated CSCT was performed for all patients. Manual CAC annotations in CTAC and CSCT provided the reference standard. In CTAC, CAC was scored automatically using a previously developed machine learning algorithm. Patients were assigned to a CVD risk category based on their Agatston score (0, 1-10, 11-100, 101-400, >400). Agreement in CVD risk categorization between manual and automatic scoring in CTAC at rest and stress resulted in Cohen's linearly weighted ? of 0.85 and 0.89, respectively. The agreement between CSCT and CTAC at rest resulted in ? of 0.82 and 0.74, using manual and automatic scoring, respectively. For CTAC at stress, these were 0.79 and 0.70, respectively. CONCLUSION:Automatic CAC scoring from CTAC PET/CT may allow routine CVD risk assessment from the CTAC component of PET/CT without any additional radiation dose or scan time.
Project description:Although residue-residue contact maps dictate the topology of proteins, sequence-based ab initio contact predictions have been found little use in actual structure prediction due to the low accuracy. We developed a composite set of nine SVM-based contact predictors that are used in I-TASSER simulation in combination with sparse template contact restraints. When testing the strategy on 273 nonhomologous targets, remarkable improvements of I-TASSER models were observed for both easy and hard targets, with p value by Student's t test<0.00001 and 0.001, respectively. In several cases, template modeling score increases by >30%, which essentially converts "nonfoldable" targets into "foldable" ones. In CASP9, I-TASSER employed ab initio contact predictions, and generated models for 26 FM targets with a GDT-score 16% and 44% higher than the second and third best servers from other groups, respectively. These findings demonstrate a new avenue to improve the accuracy of protein structure prediction especially for free-modeling targets.