Farseer-NMR: automatic treatment, analysis and plotting of large, multi-variable NMR data.
ABSTRACT: We present Farseer-NMR ( https://git.io/vAueU ), a software package to treat, evaluate and combine NMR spectroscopic data from sets of protein-derived peaklists covering a range of experimental conditions. The combined advances in NMR and molecular biology enable the study of complex biomolecular systems such as flexible proteins or large multibody complexes, which display a strong and functionally relevant response to their environmental conditions, e.g. the presence of ligands, site-directed mutations, post translational modifications, molecular crowders or the chemical composition of the solution. These advances have created a growing need to analyse those systems' responses to multiple variables. The combined analysis of NMR peaklists from large and multivariable datasets has become a new bottleneck in the NMR analysis pipeline, whereby information-rich NMR-derived parameters have to be manually generated, which can be tedious, repetitive and prone to human error, or even unfeasible for very large datasets. There is a persistent gap in the development and distribution of software focused on peaklist treatment, analysis and representation, and specifically able to handle large multivariable datasets, which are becoming more commonplace. In this regard, Farseer-NMR aims to close this longstanding gap in the automated NMR user pipeline and, altogether, reduce the time burden of analysis of large sets of peaklists from days/weeks to seconds/minutes. We have implemented some of the most common, as well as new, routines for calculation of NMR parameters and several publication-quality plotting templates to improve NMR data representation. Farseer-NMR has been written entirely in Python and its modular code base enables facile extension.
Project description:Here, we show that modern solution nuclear magnetic resonance (NMR) structures of RNA exhibit more steric clashes and conformational ambiguities than their crystallographic X-ray counterparts. To tackle these issues, we developed RNA-ff1, a new force field for structure calculation with Xplor-NIH. Using seven published NMR datasets, RNA-ff1 improves covalent geometry and MolProbity validation criteria for clashes and backbone conformation in most cases, relative to both the previous Xplor-NIH force field and the original structures associated with the experimental data. In addition, with smaller base-pair step rises in helical stems, RNA-ff1 structures enjoy more favorable base stacking. Finally, structural accuracy improves in the majority of cases, as supported by complete residual dipolar coupling cross-validation. Thus, the reported advances show great promise in bridging the quality gap that separates NMR and X-ray structures of RNA.
Project description:NMR-I-TASSER, an adaption of the I-TASSER algorithm combining NMR data for protein structure determination, recently joined the second round of the CASD-NMR experiment. Unlike many molecular dynamics-based methods, NMR-I-TASSER takes a molecular replacement-like approach to the problem by first threading the target through the PDB to identify structural templates which are then used for iterative NOE assignments and fragment structure assembly refinements. The employment of multiple templates allows NMR-I-TASSER to sample different topologies while convergence to a single structure is not required. Retroactive and blind tests of the CASD-NMR targets from Rounds 1 and 2 demonstrate that even without using NOE peak lists I-TASSER can generate correct structure topology with 15 of 20 targets having a TM-score above 0.5. With the addition of NOE-based distance restraints, NMR-I-TASSER significantly improved the I-TASSER models with all models having the TM-score above 0.5. The average RMSD was reduced from 5.29 to 2.14 Å in Round 1 and 3.18 to 1.71 Å in Round 2. There is no obvious difference in the modeling results with using raw and refined peak lists, indicating robustness of the pipeline to the NOE assignment errors. Overall, despite the low-resolution modeling the current NMR-I-TASSER pipeline provides a coarse-grained structure folding approach complementary to traditional molecular dynamics simulations, which can produce fast near-native frameworks for atomic-level structural refinement.
Project description:As part of efforts to develop improved methods for NMR protein sample preparation and structure determination, the Northeast Structural Genomics Consortium (NESG) has implemented an NMR screening pipeline for protein target selection, construct optimization, and buffer optimization, incorporating efficient microscale NMR screening of proteins using a micro-cryoprobe. The process is feasible because the newest generation probe requires only small amounts of protein, typically 30-200 microg in 8-35 microl volume. Extensive automation has been made possible by the combination of database tools, mechanization of key process steps, and the use of a micro-cryoprobe that gives excellent data while requiring little optimization and manual setup. In this perspective, we describe the overall process used by the NESG for screening NMR samples as part of a sample optimization process, assessing optimal construct design and solution conditions, as well as for determining protein rotational correlation times in order to assess protein oligomerization states. Database infrastructure has been developed to allow for flexible implementation of new screening protocols and harvesting of the resulting output. The NESG micro NMR screening pipeline has also been used for detergent screening of membrane proteins. Descriptions of the individual steps in the NESG NMR sample design, production, and screening pipeline are presented in the format of a standard operating procedure.
Project description:Industry 4.0 is all about interconnectivity, sensor-enhanced process control, and data-driven systems. Process analytical technology (PAT) such as online nuclear magnetic resonance (NMR) spectroscopy is gaining in importance, as it increasingly contributes to automation and digitalization in production. In many cases up to now, however, a classical evaluation of process data and their transformation into knowledge is not possible or not economical due to the insufficiently large datasets available. When developing an automated method applicable in process control, sometimes only the basic data of a limited number of batch tests from typical product and process development campaigns are available. However, these datasets are not large enough for training machine-supported procedures. In this work, to overcome this limitation, a new procedure was developed, which allows physically motivated multiplication of the available reference data in order to obtain a sufficiently large dataset for training machine learning algorithms. The underlying example chemical synthesis was measured and analyzed with both application-relevant low-field NMR and high-field NMR spectroscopy as reference method. Artificial neural networks (ANNs) have the potential to infer valuable process information already from relatively limited input data. However, in order to predict the concentration at complex conditions (many reactants and wide concentration ranges), larger ANNs and, therefore, a larger training dataset are required. We demonstrate that a moderately complex problem with four reactants can be addressed using ANNs in combination with the presented PAT method (low-field NMR) and with the proposed approach to generate meaningful training data. Graphical abstract.
Project description:Projection-reconstruction NMR (PR-NMR) has attracted growing attention as a method for collecting multidimensional NMR data rapidly. The PR-NMR procedure involves measuring lower-dimensional projections of a higher-dimensional spectrum, which are then used for the mathematical reconstruction of the full spectrum. We describe here the program PR-CALC, for the reconstruction of NMR spectra from projection data. This program implements a number of reconstruction algorithms, highly optimized to achieve maximal performance, and manages the reconstruction process automatically, producing either full spectra or subsets, such as regions or slices, as requested. The ability to obtain subsets allows large spectra to be analyzed by reconstructing and examining only those subsets containing peaks, offering considerable savings in processing time and storage space. PR-CALC is straightforward to use, and integrates directly into the conventional pipeline for data processing and analysis. It was written in standard C+ + and should run on any platform. The organization is flexible, and permits easy extension of capabilities, as well as reuse in new software. PR-CALC should facilitate the widespread utilization of PR-NMR in biomedical research.
Project description:Biomolecular NMR structures are now routinely used in biology, chemistry, and bioinformatics. Methods and metrics for assessing the accuracy and precision of protein NMR structures are beginning to be standardized across the biological NMR community. These include both knowledge-based assessment metrics, parameterized from the database of protein structures, and model versus data assessment metrics. On line servers are available that provide comprehensive protein structure quality assessment reports, and efforts are in progress by the world-wide Protein Data Bank (wwPDB) to develop a biomolecular NMR structure quality assessment pipeline as part of the structure deposition process. These quality assessment metrics and standards will aid NMR spectroscopists in determining more accurate structures, and increase the value and utility of these structures for the broad scientific community.
Project description:Nuclear magnetic resonance (NMR) spectroscopy has proven invaluable in the diverse field of chemometrics due to its ability to deliver information-rich spectral datasets of complex mixtures for analysis by techniques such as principal component analysis (PCA). However, NMR datasets present a unique challenge during preprocessing due to differences in phase offsets between individual spectra, thus complicating the correction of random dilution factors that may also occur. We show that simultaneously correcting phase and dilution errors in NMR datasets representative of metabolomics data yields improved cluster quality in PCA scores space, even with significant initial phase errors in the data.
Project description:In this chapter, we concentrate on the production of high-quality protein samples for nuclear magnetic resonance (NMR) studies. In particular, we provide an in-depth description of recent advances in the production of NMR samples and their synergistic use with recent advancements in NMR hardware. We describe the protein production platform of the Northeast Structural Genomics Consortium and outline our high-throughput strategies for producing high-quality protein samples for NMR studies. Our strategy is based on the cloning, expression, and purification of 6×-His-tagged proteins using T7-based Escherichia coli systems and isotope enrichment in minimal media. We describe 96-well ligation-independent cloning and analytical expression systems, parallel preparative scale fermentation, and high-throughput purification protocols. The 6×-His affinity tag allows for a similar two-step purification procedure implemented in a parallel high-throughput fashion that routinely results in purity levels sufficient for NMR studies (>97% homogeneity). Using this platform, the protein open reading frames of over 17,500 different targeted proteins (or domains) have been cloned as over 28,000 constructs. Nearly 5000 of these proteins have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html), resulting in more than 950 new protein structures, including more than 400 NMR structures, deposited in the Protein Data Bank. The Northeast Structural Genomics Consortium pipeline has been effective in producing protein samples of both prokaryotic and eukaryotic origin. Although this chapter describes our entire pipeline for producing isotope-enriched protein samples, it focuses on the major updates introduced during the last 5 years (Phase 2 of the National Institute of General Medical Sciences Protein Structure Initiative). Our advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are suitable for implementation in a large individual laboratory or by a small group of collaborating investigators for structural biology, functional proteomics, ligand screening, and structural genomics research.
Project description:The Focal Adhesion Targeting (FAT) domain of Focal Adhesion Kinase (FAK) is a promising drug target since FAK is overexpressed in many malignancies and promotes cancer cell metastasis. The FAT domain serves as a scaffolding protein, and its interaction with the protein paxillin localizes FAK to focal adhesions. Various studies have highlighted the importance of FAT-paxillin binding in tumor growth, cell invasion, and metastasis. Targeting this interaction through high-throughput screening (HTS) provides a challenge due to the large and complex binding interface. In this report, we describe a novel approach to targeting FAT through fragment-based drug discovery (FBDD). We developed two fragment-based screening assays-a primary SPR assay and a secondary heteronuclear single quantum coherence nuclear magnetic resonance (HSQC-NMR) assay. For SPR, we designed an AviTag construct, optimized SPR buffer conditions, and created mutant controls. For NMR, resonance backbone assignments of the human FAT domain were obtained for the HSQC assay. A 189-compound fragment library from Enamine was screened through our primary SPR assay to demonstrate the feasibility of a FAT-FBDD pipeline, with 19 initial hit compounds. A final total of 11 validated hits were identified after secondary screening on NMR. This screening pipeline is the first FBDD screen of the FAT domain reported and represents a valid method for further drug discovery efforts on this difficult target.
Project description:As methods for analysis of biomolecular structure and dynamics using nuclear magnetic resonance spectroscopy (NMR) continue to advance, the resulting 3D structures, chemical shifts, and other NMR data are broadly impacting biology, chemistry, and medicine. Structure model assessment is a critical area of NMR methods development, and is an essential component of the process of making these structures accessible and useful to the wider scientific community. For these reasons, the Worldwide Protein Data Bank (wwPDB) has convened an NMR Validation Task Force (NMR-VTF) to work with wwPDB partners in developing metrics and policies for biomolecular NMR data harvesting, structure representation, and structure quality assessment. This paper summarizes the recommendations of the NMR-VTF, and lays the groundwork for future work in developing standards and metrics for biomolecular NMR structure quality assessment.