NMRFAM-SDF: a protein structure determination framework.
ABSTRACT: The computationally demanding nature of automated NMR structure determination necessitates a delicate balancing of factors that include the time complexity of data collection, the computational complexity of chemical shift assignments, and selection of proper optimization steps. During the past two decades the computational and algorithmic aspects of several discrete steps of the process have been addressed. Although no single comprehensive solution has emerged, the incorporation of a validation protocol has gained recognition as a necessary step for a robust automated approach. The need for validation becomes even more pronounced in cases of proteins with higher structural complexity, where potentially larger errors generated at each step can propagate and accumulate in the process of structure calculation, thereby significantly degrading the efficacy of any software framework. This paper introduces a complete framework for protein structure determination with NMR--from data acquisition to the structure determination. The aim is twofold: to simplify the structure determination process for non-NMR experts whenever feasible, while maintaining flexibility by providing a set of modules that validate each step, and to enable the assessment of error propagations. This framework, called NMRFAM-SDF (NMRFAM-Structure Determination Framework), and its various components are available for download from the NMRFAM website (http://nmrfam.wisc.edu/software.htm).
Project description:SPARKY (Goddard and Kneller, SPARKY 3) remains the most popular software program for NMR data analysis, despite the fact that development of the package by its originators ceased in 2001. We have taken over the development of this package and describe NMRFAM-SPARKY, which implements new functions reflecting advances in the biomolecular NMR field. NMRFAM-SPARKY has been repackaged with current versions of Python and Tcl/Tk, which support new tools for NMR peak simulation and graphical assignment determination. These tools, along with chemical shift predictions from the PACSY database, greatly accelerate protein side chain assignments. NMRFAM-SPARKY supports automated data format interconversion for interfacing with a variety of web servers including, PECAN , PINE, TALOS-N, CS-Rosetta, SHIFTX2 and PONDEROSA-C/S.The software package, along with binary and source codes, if desired, can be downloaded freely from http://pine.nmrfam.wisc.edu/download_packages.html. Instruction manuals and video tutorials can be found at http://firstname.lastname@example.org or email@example.comSupplementary data are available at Bioinformatics online.
Project description:Chemical Shift-Rosetta (CS-Rosetta) is an automated method that employs NMR chemical shifts to model protein structures de novo. In this chapter, we introduce the terminology and central concepts of CS-Rosetta. We describe the architecture and functionality of automatic NOESY assignment (AutoNOE) and structure determination protocols (Abrelax and RASREC) within the CS-Rosetta framework. We further demonstrate how CS-Rosetta can discriminate near-native structures against a large conformational search space using restraints obtained from NMR data, and/or sequence and structure homology. We highlight how CS-Rosetta can be combined with alternative automated approaches to (i) model oligomeric systems and (ii) create NMR-based structure determination pipeline. To show its practical applicability, we emphasize on the computational requirements and performance of CS-Rosetta for protein targets of varying molecular weight and complexity. Finally, we discuss the current Python interface, which enables easy execution of protocols for rapid and accurate high-resolution structure determination.
Project description:Summary:Nuclear magnetic resonance (NMR) spectroscopy, along with X-ray crystallography and cryoelectron microscopy, is one of the three major tools that enable the determination of atomic-level structural models of biological macromolecules. Of these, NMR has the unique ability to follow important processes in solution, including conformational changes, internal dynamics and protein-ligand interactions. As a means for facilitating the handling and analysis of spectra involved in these types of NMR studies, we have developed PINE-SPARKY.2, a software package that integrates and automates discrete tasks that previously required interaction with separate software packages. The graphical user interface of PINE-SPARKY.2 simplifies chemical shift assignment and verification, automated detection of secondary structural elements, predictions of flexibility and hydrophobic cores, and calculation of three-dimensional structural models. Availability and implementation:PINE-SPARKY.2 is available in the latest version of NMRFAM-SPARKY from the National Magnetic Resonance Facility at Madison (http://pine.nmrfam.wisc.edu/download_packages.html), the NMRbox Project (https://nmrbox.org) and to subscribers to the SBGrid (https://sbgrid.org). For a detailed description of the program, see http://www.nmrfam.wisc.edu/pine-sparky2.htm. Contact:firstname.lastname@example.org or email@example.com. Supplementary information:Supplementary data are available at Bioinformatics online.
Project description:The J-UNIO (JCSG protocol using the software UNIO) procedure for automated protein structure determination by NMR in solution is introduced. In the present implementation, J-UNIO makes use of APSY-NMR spectroscopy, 3D heteronuclear-resolved [(1)H,(1)H]-NOESY experiments, and the software UNIO. Applications with proteins from the JCSG target list with sizes up to 150 residues showed that the procedure is highly robust and efficient. In all instances the correct polypeptide fold was obtained in the first round of automated data analysis and structure calculation. After interactive validation of the data obtained from the automated routine, the quality of the final structures was comparable to results from interactive structure determination. Special advantages are that the NMR data have been recorded with 6-10 days of instrument time per protein, that there is only a single step of chemical shift adjustments to relate the backbone signals in the APSY-NMR spectra with the corresponding backbone signals in the NOESY spectra, and that the NOE-based amino acid side chain chemical shift assignments are automatically focused on those residues that are heavily weighted in the structure calculation. The individual working steps of J-UNIO are illustrated with the structure determination of the protein YP_926445.1 from Shewanella amazonensis, and the results obtained with 17 JCSG targets are critically evaluated.
Project description:ADAPT-NMR (Assignment-directed Data collection Algorithm utilizing a Probabilistic Toolkit in NMR) represents a groundbreaking prototype for automated protein structure determination by nuclear magnetic resonance (NMR) spectroscopy. With a [(13)C,(15)N]-labeled protein sample loaded into the NMR spectrometer, ADAPT-NMR delivers complete backbone resonance assignments and secondary structure in an optimal fashion without human intervention. ADAPT-NMR achieves this by implementing a strategy in which the goal of optimal assignment in each step determines the subsequent step by analyzing the current sum of available data. ADAPT-NMR is the first iterative and fully automated approach designed specifically for the optimal assignment of proteins with fast data collection as a byproduct of this goal. ADAPT-NMR evaluates the current spectral information, and uses a goal-directed objective function to select the optimal next data collection step(s) and then directs the NMR spectrometer to collect the selected data set. ADAPT-NMR extracts peak positions from the newly collected data and uses this information in updating the analysis resonance assignments and secondary structure. The goal-directed objective function then defines the next data collection step. The procedure continues until the collected data support comprehensive peak identification, resonance assignments at the desired level of completeness, and protein secondary structure. We present test cases in which ADAPT-NMR achieved results in two days or less that would have taken two months or more by manual approaches.
Project description:BACKGROUND: NMR chemical shift prediction plays an important role in various applications in computational biology. Among others, structure determination, structure optimization, and the scoring of docking results can profit from efficient and accurate chemical shift estimation from a three-dimensional model.A variety of NMR chemical shift prediction approaches have been presented in the past, but nearly all of these rely on laborious manual data set preparation and the training itself is not automatized, making retraining the model, e.g., if new data is made available, or testing new models a time-consuming manual chore. RESULTS: In this work, we present the framework NightShift (NMR Shift Inference by General Hybrid Model Training), which enables automated data set generation as well as model training and evaluation of protein NMR chemical shift prediction.In addition to this main result - the NightShift framework itself - we describe the resulting, automatically generated, data set and, as a proof-of-concept, a random forest model called Spinster that was built using the pipeline. CONCLUSION: By demonstrating that the performance of the automatically generated predictors is at least en par with the state of the art, we conclude that automated data set and predictor generation is well-suited for the design of NMR chemical shift estimators.The framework can be downloaded from https://bitbucket.org/akdehof/nightshift. It requires the open source Biochemical Algorithms Library (BALL), and is available under the conditions of the GNU Lesser General Public License (LGPL). We additionally offer a browser-based user interface to our NightShift instance employing the Galaxy framework via https://ballaxy.bioinf.uni-sb.de/.
Project description:NMR spectroscopy is a powerful technique for determining structural and functional features of biomolecules in physiological solution as well as for observing their intermolecular interactions in real-time. However, complex steps associated with its practice have made the approach daunting for non-specialists. We introduce an NMR platform that makes biomolecular NMR spectroscopy much more accessible by integrating tools, databases, web services, and video tutorials that can be launched by simple installation of NMRFAM software packages or using a cross-platform virtual machine that can be run on any standard laptop or desktop computer. The software package can be downloaded freely from the NMRFAM software download page ( http://pine.nmrfam.wisc.edu/download_packages.html ), and detailed instructions are available from the Integrative NMR Video Tutorial page ( http://pine.nmrfam.wisc.edu/integrative.html ).
Project description:Human rhinovirus strains differ greatly in their virulence, and this has been correlated with the differing substrate specificity of the respective 2A protease (2Apro). Rhinoviruses use their 2Apro to cleave a spectrum of cellular proteins important to virus replication and anti-host activities. These enzymes share a chymotrypsin-like fold stabilized by a tetra-coordinated zinc ion. The catalytic triad consists of conserved Cys (C105), His (H34), and Asp (D18) residues. We used a semi-automated NMR protocol developed at NMRFAM to determine the solution structure of 2Apro (C105A variant) from an isolate of the clinically important rhinovirus C species (RV-C). The backbone of C2 2Apro superimposed closely (1.41-1.81 Å rmsd) with those of orthologs from RV-A2, coxsackie B4 (CB4), and enterovirus 71 (EV71) having sequence identities between 40% and 60%. Comparison of the structures suggest that the differential functional properties of C2 2Apro stem from its unique surface charge, high proportion of surface aromatics, and sequence surrounding the di-tyrosine flap.
Project description:The quality of protein structures determined by nuclear magnetic resonance (NMR) spectroscopy is contingent on the number and quality of experimentally-derived resonance assignments, distance and angular restraints. Two key features of protein NMR data have posed challenges for the routine and automated structure determination of small to medium sized proteins; (1) spectral resolution - especially of crowded nuclear Overhauser effect spectroscopy (NOESY) spectra, and (2) the reliance on a continuous network of weak scalar couplings as part of most common assignment protocols. In order to facilitate NMR structure determination, we developed a semi-automated strategy that utilizes non-uniform sampling (NUS) and multidimensional decomposition (MDD) for optimal data collection and processing of selected, high resolution multidimensional NMR experiments, combined it with an ABACUS protocol for sequential and side chain resonance assignments, and streamlined this procedure to execute structure and refinement calculations in CYANA and CNS, respectively. Two graphical user interfaces (GUIs) were developed to facilitate efficient analysis and compilation of the data and to guide automated structure determination. This integrated method was implemented and refined on over 30 high quality structures of proteins ranging from 5.5 to 16.5 kDa in size.
Project description:Three-dimensional protein structure determination is a costly process due in part to the low success rate within groups of potential targets. Conventional validation methods eliminate the vast majority of proteins from further consideration through a time-consuming succession of screens for expression, solubility, purification, and folding. False negatives at each stage incur unwarranted reductions in the overall success rate. We developed a semi-automated protocol for isotopically-labeled protein production using the Maxwell-16, a commercially available bench top robot, that allows for single-step target screening by 2D NMR. In the span of a week, one person can express, purify, and screen 48 different (15)N-labeled proteins, accelerating the validation process by more than 10-fold. The yield from a single channel of the Maxwell-16 is sufficient for acquisition of a high-quality 2D (1)H-(15)N-HSQC spectrum using a 3-mm sample cell and 5-mm cryogenic NMR probe. Maxwell-16 screening of a control group of proteins reproduced previous validation results from conventional small-scale expression screening and large-scale production approaches currently employed by our structural genomics pipeline. Analysis of 18 new protein constructs identified two potential structure targets that included the second PDZ domain of human Par-3. To further demonstrate the broad utility of this production strategy, we solved the PDZ2 NMR structure using [U-(15)N,(13)C] protein prepared using the Maxwell-16. This novel semi-automated protein production protocol reduces the time and cost associated with NMR structure determination by eliminating unnecessary screening and scale-up steps.