Project description:A standardized protocol enabling rapid NMR data collection for high-quality protein structure determination is presented that allows one to capitalize on high spectrometer sensitivity: a set of five G-matrix Fourier transform NMR experiments for resonance assignment based on highly resolved 4D and 5D spectral information is acquired in conjunction with a single simultaneous 3D 15N,13C(aliphatic),13C(aromatic)-resolved [1H,1H]-NOESY spectrum providing 1H-1H upper distance limit constraints. The protocol was integrated with methodology for semiautomated data analysis and used to solve eight NMR protein structures of the Northeast Structural Genomics Consortium pipeline. The molecular masses of the hypothetical target proteins ranged from 9 to 20 kDa with an average of approximately 14 kDa. Between 1 and 9 days of instrument time were invested per structure, which is less than approximately 10-25% of the measurement time routinely required to date with conventional approaches. The protocol presented here effectively removes data collection as a bottleneck for high-throughput solution structure determination of proteins up to at least approximately 20 kDa, while concurrently providing spectra that are highly amenable to fast and robust analysis.
Project description:There is a general need to develop more powerful and more robust methods for structural characterization of homodimers, homo-oligomers, and multiprotein complexes using solution-state NMR methods. In recent years, there has been increasing emphasis on integrating distinct and complementary methodologies for structure determination of multiprotein complexes. One approach not yet widely used is to obtain intermediate and long-range distance constraints from paramagnetic relaxation enhancements (PRE) and electron paramagnetic resonance (EPR)-based techniques such as double electron electron resonance (DEER), which, when used together, can provide supplemental distance constraints spanning to 10-70 A. In this Communication, we describe integration of PRE and DEER data with conventional solution-state nuclear magnetic resonance (NMR) methods for structure determination of Dsy0195, a homodimer (62 amino acids per monomer) from Desulfitobacterium hafniense. Our results indicate that combination of conventional NMR restraints with only one or a few DEER distance constraints and a small number of PRE constraints is sufficient for the automatic NMR-based structure determination program CYANA to build a network of interchain nuclear Overhauser effect constraints that can be used to accurately define both the homodimer interface and the global homodimer structure. The use of DEER distances as a source of supplemental constraints as described here has virtually no upper molecular weight limit, and utilization of the PRE constraints is limited only by the ability to make accurate assignments of the protein amide proton and nitrogen chemical shifts.
Project description:While information from homologous structures plays a central role in X-ray structure determination by molecular replacement, such information is rarely used in NMR structure determination because it can be incorrect, both locally and globally, when evolutionary relationships are inferred incorrectly or there has been considerable evolutionary structural divergence. Here we describe a method that allows robust modeling of protein structures of up to 225 residues by combining (1)H(N), (13)C, and (15)N backbone and (13)C? chemical shift data, distance restraints derived from homologous structures, and a physically realistic all-atom energy function. Accurate models are distinguished from inaccurate models generated using incorrect sequence alignments by requiring that (i) the all-atom energies of models generated using the restraints are lower than models generated in unrestrained calculations and (ii) the low-energy structures converge to within 2.0 ? backbone rmsd over 75% of the protein. Benchmark calculations on known structures and blind targets show that the method can accurately model protein structures, even with very remote homology information, to a backbone rmsd of 1.2-1.9 ? relative to the conventional determined NMR ensembles and of 0.9-1.6 ? relative to X-ray structures for well-defined regions of the protein structures. This approach facilitates the accurate modeling of protein structures using backbone chemical shift data without need for side-chain resonance assignments and extensive analysis of NOESY cross-peak assignments.
Project description:X-ray free-electron lasers (XFELs) have inspired the development of serial femtosecond crystallography (SFX) as a method to solve the structure of proteins. SFX datasets are collected from a sequence of protein microcrystals injected across ultrashort X-ray pulses. The idea behind SFX is that diffraction from the intense, ultrashort X-ray pulses leaves the crystal before the crystal is obliterated by the effects of the X-ray pulse. The success of SFX at XFELs has catalyzed interest in analogous experiments at synchrotron-radiation (SR) sources, where data are collected from many small crystals and the ultrashort pulses are replaced by exposure times that are kept short enough to avoid significant crystal damage. The diffraction signal from each short exposure is so 'sparse' in recorded photons that the process of recording the crystal intensity is itself a reconstruction problem. Using the EMC algorithm, a successful reconstruction is demonstrated here in a sparsity regime where there are no Bragg peaks that conventionally would serve to determine the orientation of the crystal in each exposure. In this proof-of-principle experiment, a hen egg-white lysozyme (HEWL) crystal rotating about a single axis was illuminated by an X-ray beam from an X-ray generator to simulate the diffraction patterns of microcrystals from synchrotron radiation. Millions of these sparse frames, typically containing only ∼200 photons per frame, were recorded using a fast-framing detector. It is shown that reconstruction of three-dimensional diffraction intensity is possible using the EMC algorithm, even with these extremely sparse frames and without knowledge of the rotation angle. Further, the reconstructed intensity can be phased and refined to solve the protein structure using traditional crystallographic software. This suggests that synchrotron-based serial crystallography of micrometre-sized crystals can be practical with the aid of the EMC algorithm even in cases where the data are sparse.
Project description:Accurate protein structure determination by solution-state NMR is challenging for proteins greater than about 20kDa, for which extensive perdeuteration is generally required, providing experimental data that are incomplete (sparse) and ambiguous. However, the massive increase in evolutionary sequence information coupled with advances in methods for sequence covariance analysis can provide reliable residue-residue contact information for a protein from sequence data alone. These "evolutionary couplings (ECs)" can be combined with sparse NMR data to determine accurate 3D protein structures. This hybrid "EC-NMR" method has been developed using NMR data for several soluble proteins and validated by comparison with corresponding reference structures determined by X-ray crystallography and/or conventional NMR methods. For small proteins, only backbone resonance assignments are utilized, while for larger proteins both backbone and some sidechain methyl resonance assignments are generally required. ECs can be combined with sparse NMR data obtained on deuterated, selectively protonated protein samples to provide structures that are more accurate and complete than those obtained using such sparse NMR data alone. EC-NMR also has significant potential for analysis of protein structures from solid-state NMR data and for studies of integral membrane proteins. The requirement that ECs are consistent with NMR data recorded on a specific member of a protein family, under specific conditions, also allows identification of ECs that reflect alternative allosteric or excited states of the protein structure.
Project description:CASP13 has investigated the impact of sparse NMR data on the accuracy of protein structure prediction. NOESY and 15 N-1 H residual dipolar coupling data, typical of that obtained for 15 N,13 C-enriched, perdeuterated proteins up to about 40 kDa, were simulated for 11 CASP13 targets ranging in size from 80 to 326 residues. For several targets, two prediction groups generated models that are more accurate than those produced using baseline methods. Real NMR data collected for a de novo designed protein were also provided to predictors, including one data set in which only backbone resonance assignments were available. Some NMR-assisted prediction groups also did very well with these data. CASP13 also assessed whether incorporation of sparse NMR data improves the accuracy of protein structure prediction relative to nonassisted regular methods. In most cases, incorporation of sparse, noisy NMR data results in models with higher accuracy. The best NMR-assisted models were also compared with the best regular predictions of any CASP13 group for the same target. For six of 13 targets, the most accurate model provided by any NMR-assisted prediction group was more accurate than the most accurate model provided by any regular prediction group; however, for the remaining seven targets, one or more regular prediction method provided a more accurate model than even the best NMR-assisted model. These results suggest a novel approach for protein structure determination, in which advanced prediction methods are first used to generate structural models, and sparse NMR data is then used to validate and/or refine these models.
Project description:Conventional NMR structure determination requires nearly complete assignment of the cross peaks of a refined NOESY peak list. Depending on the size of the protein and quality of the spectral data, this can be a time-consuming manual process requiring several rounds of peak list refinement and structure determination. Programs such as Aria, CYANA, and AutoStructure can generate models using unassigned NOESY data but are very sensitive to the quality of the input peak lists and can converge to inaccurate structures if the signal-to-noise of the peak lists is low. Here, we show that models with high accuracy and reliability can be produced by combining the strengths of the high-resolution structure prediction program Rosetta with global measures of the agreement between structure models and experimental data. A first round of models generated using CS-Rosetta (Rosetta supplemented with backbone chemical shift information) are filtered on the basis of their goodness-of-fit with unassigned NOESY peak lists using the DP-score, and the best fitting models are subjected to high resolution refinement with the Rosetta rebuild-and-refine protocol. This hybrid approach uses both local backbone chemical shift and the unassigned NOESY data to direct Rosetta trajectories toward the native structure and produces more accurate models than AutoStructure/CYANA or CS-Rosetta alone, particularly when using raw unedited NOESY peak lists. We also show that when accurate manually refined NOESY peak lists are available, Rosetta refinement can consistently increase the accuracy of models generated using CYANA and AutoStructure.
Project description:Protein NMR chemical shifts are highly sensitive to local structure. A robust protocol is described that exploits this relation for de novo protein structure generation, using as input experimental parameters the (13)C(alpha), (13)C(beta), (13)C', (15)N, (1)H(alpha) and (1)H(N) NMR chemical shifts. These shifts are generally available at the early stage of the traditional NMR structure determination process, before the collection and analysis of structural restraints. The chemical shift based structure determination protocol uses an empirically optimized procedure to select protein fragments from the Protein Data Bank, in conjunction with the standard ROSETTA Monte Carlo assembly and relaxation methods. Evaluation of 16 proteins, varying in size from 56 to 129 residues, yielded full-atom models that have 0.7-1.8 A root mean square deviations for the backbone atoms relative to the experimentally determined x-ray or NMR structures. The strategy also has been successfully applied in a blind manner to nine protein targets with molecular masses up to 15.4 kDa, whose conventional NMR structure determination was conducted in parallel by the Northeast Structural Genomics Consortium. This protocol potentially provides a new direction for high-throughput NMR structure determination.
Project description:Accurate determination of protein structure by NMR spectroscopy is challenging for larger proteins, for which experimental data are often incomplete and ambiguous. Evolutionary sequence information together with advances in maximum entropy statistical methods provide a rich complementary source of structural constraints. We have developed a hybrid approach (evolutionary coupling-NMR spectroscopy; EC-NMR) combining sparse NMR data with evolutionary residue-residue couplings and demonstrate accurate structure determination for several proteins 6-41 kDa in size.
Project description:X-ray diffraction patterns may be obtained from individual submicron protein nanocrystals using a femtosecond pulse from a free-electron X-ray laser. Many "single-shot" patterns are read out every second from a stream of nanocrystals lying in random orientations. The short pulse terminates before significant atomic (or electronic) motion commences, minimizing radiation damage. Simulated patterns for Photosystem I nanocrystals are used to develop a method for recovering structure factors from tens of thousands of snapshot patterns from nanocrystals varying in size, shape and orientation. We determine the number of shots needed for a required accuracy in structure factor measurement and resolution, and investigate the convergence of our Monte-Carlo integration method.