Phylogenomic analysis of the GIY-YIG nuclease superfamily.
ABSTRACT: The GIY-YIG domain was initially identified in homing endonucleases and later in other selfish mobile genetic elements (including restriction enzymes and non-LTR retrotransposons) and in enzymes involved in DNA repair and recombination. However, to date no systematic search for novel members of the GIY-YIG superfamily or comparative analysis of these enzymes has been reported.We carried out database searches to identify all members of known GIY-YIG nuclease families. Multiple sequence alignments together with predicted secondary structures of identified families were represented as Hidden Markov Models (HMM) and compared by the HHsearch method to the uncharacterized protein families gathered in the COG, KOG, and PFAM databases. This analysis allowed for extending the GIY-YIG superfamily to include members of COG3680 and a number of proteins not classified in COGs and to predict that these proteins may function as nucleases, potentially involved in DNA recombination and/or repair. Finally, all old and new members of the GIY-YIG superfamily were compared and analyzed to infer the phylogenetic tree.An evolutionary classification of the GIY-YIG superfamily is presented for the very first time, along with the structural annotation of all (sub)families. It provides a comprehensive picture of sequence-structure-function relationships in this superfamily of nucleases, which will help to design experiments to study the mechanism of action of known members (especially the uncharacterized ones) and will facilitate the prediction of function for the newly discovered ones.
Project description:The GIY-YIG nuclease domain is present in all kingdoms of life and has diverse functions. It is found in the eukaryotic flap endonuclease and Holliday junction resolvase Slx1-Slx4, the prokaryotic nucleotide excision repair proteins UvrC and Cho, and in proteins of 'selfish' genetic elements. Here we present the structures of the ternary pre- and post-cleavage complexes of the type II GIY-YIG restriction endonuclease Hpy188I with DNA and a surrogate or catalytic metal ion, respectively. Our structures suggest that GIY-YIG nucleases catalyze DNA hydrolysis by a single substitution reaction. They are consistent with a previous proposal that a tyrosine residue (which we expect to occur in its phenolate form) acts as a general base for the attacking water molecule. In contrast to the earlier proposal, our data identify the general base with the GIY and not the YIG tyrosine. A conserved glutamate residue (Glu149 provided in trans in Hpy188I) anchors a single metal cation in the active site. This metal ion contacts the phosphate proS oxygen atom and the leaving group 3'-oxygen atom, presumably to facilitate its departure. Taken together, our data reveal striking analogy in the absence of homology between GIY-YIG and ???-Me nucleases.
Project description:The GIY-YIG endonuclease family comprises hundreds of diverse proteins and a multitude of functions; none have been visualized bound to DNA. The structure of the GIY-YIG restriction endonuclease R.Eco29kI has been solved both alone and bound to its target site. The protein displays a domain-swapped homodimeric structure with several extended surface loops encircling the DNA. Only three side chains from each protein subunit contact DNA bases, two directly and one via a bridging solvent molecule. Both tyrosine residues within the GIY-YIG motif are positioned in the catalytic center near a putative nucleophilic water; the remainder of the active site resembles the HNH endonuclease family. The structure illustrates how the GIY-YIG scaffold has been adapted for the highly specific recognition of a DNA restriction site, in contrast to nonspecific DNA cleavage by GIY-YIG domains in homing endonucleases or structure-specific cleavage by DNA repair enzymes such as UvrC.
Project description:BACKGROUND: Catalytic domains of Type II restriction endonucleases (REases) belong to a few unrelated three-dimensional folds. While the PD-(D/E)XK fold is most common among these enzymes, crystal structures have been also determined for single representatives of two other folds: PLD (R.BfiI) and half-pipe (R.PabI). Bioinformatics analyses supported by mutagenesis experiments suggested that some REases belong to the HNH fold (e.g. R.KpnI), and that a small group represented by R.Eco29kI belongs to the GIY-YIG fold. However, for a large fraction of REases with known sequences, the three-dimensional fold and the architecture of the active site remain unknown, mostly due to extreme sequence divergence that hampers detection of homology to enzymes with known folds. RESULTS: R.Hpy188I is a Type II REase with unknown structure. PSI-BLAST searches of the non-redundant protein sequence database reveal only 1 homolog (R.HpyF17I, with nearly identical amino acid sequence and the same DNA sequence specificity). Standard application of state-of-the-art protein fold-recognition methods failed to predict the relationship of R.Hpy188I to proteins with known structure or to other protein families. In order to increase the amount of evolutionary information in the multiple sequence alignment, we have expanded our sequence database searches to include sequences from metagenomics projects. This search resulted in identification of 23 further members of R.Hpy188I family, both from metagenomics and the non-redundant database. Moreover, fold-recognition analysis of the extended R.Hpy188I family revealed its relationship to the GIY-YIG domain and allowed for computational modeling of the R.Hpy188I structure. Analysis of the R.Hpy188I model in the light of sequence conservation among its homologs revealed an unusual variant of the active site, in which the typical Tyr residue of the YIG half-motif had been substituted by a Lys residue. Moreover, some of its homologs have the otherwise invariant Arg residue in a non-homologous position in sequence that nonetheless allows for spatial conservation of the guanidino group potentially involved in phosphate binding. CONCLUSION: The present study eliminates a significant "white spot" on the structural map of REases. It also provides important insight into sequence-structure-function relationships in the GIY-YIG nuclease superfamily. Our results reveal that in the case of proteins with no or few detectable homologs in the standard "non-redundant" database, it is useful to expand this database by adding the metagenomic sequences, which may provide evolutionary linkage to detect more remote homologs.
Project description:The GIY-YIG nuclease domain is found within protein scaffolds that participate in diverse cellular pathways and contains a single active site that hydrolyzes DNA by a one-metal ion mechanism. GIY-YIG homing endonucleases (GIY-HEs) are two-domain proteins with N-terminal GIY-YIG nuclease domains connected to C-terminal DNA-binding and they are thought to function as monomers. Using I-BmoI as a model GIY-HE, we test mechanisms by which the single active site is used to generate a double-strand break. We show that I-BmoI is partially disordered in the absence of substrate, and that the GIY-YIG domain alone has weak affinity for DNA. Significantly, we show that I-BmoI functions as a monomer at all steps of the reaction pathway and does not transiently dimerize or use sequential transesterification reactions to cleave substrate. Our results are consistent with the I-BmoI DNA-binding domain acting as a molecular anchor to tether the GIY-YIG domain to substrate, permitting rotation of the GIY-YIG domain to sequentially nick each DNA strand. These data highlight the mechanistic differences between monomeric GIY-HEs and dimeric or tetrameric GIY-YIG restriction enzymes, and they have implications for the use of the GIY-YIG domain in genome-editing applications.
Project description:The PD-(D/E)XK nuclease superfamily, initially identified in type II restriction endonucleases and later in many enzymes involved in DNA recombination and repair, is one of the most challenging targets for protein sequence analysis and structure prediction. Typically, the sequence similarity between these proteins is so low, that most of the relationships between known members of the PD-(D/E)XK superfamily were identified only after the corresponding structures were determined experimentally. Thus, it is tempting to speculate that among the uncharacterized protein families, there are potential nucleases that remain to be discovered, but their identification requires more sensitive tools than traditional PSI-BLAST searches.The low degree of amino acid conservation hampers the possibility of identification of new members of the PD-(D/E)XK superfamily based solely on sequence comparisons to known members. Therefore, we used a recently developed method HHsearch for sensitive detection of remote similarities between protein families represented as profile Hidden Markov Models enhanced by secondary structure. We carried out a comparison of known families of PD-(D/E)XK nucleases to the database comprising the COG and PFAM profiles corresponding to both functionally characterized as well as uncharacterized protein families to detect significant similarities. The initial candidates for new nucleases were subsequently verified by sequence-structure threading, comparative modeling, and identification of potential active site residues.In this article, we report identification of the PD-(D/E)XK nuclease domain in numerous proteins implicated in interactions with DNA but with unknown structure and mechanism of action (such as putative recombinase RmuC, DNA competence factor CoiA, a DNA-binding protein SfsA, a large human protein predicted to be a DNA repair enzyme, predicted archaeal transcription regulators, and the head completion protein of phage T4) and in proteins for which no function was assigned to date (such as YhcG, various phage proteins, novel candidates for restriction enzymes). Our results contributes to the reduction of "white spaces" on the sequence-structure-function map of the protein universe and will help to jump-start the experimental characterization of new nucleases, of which many may be of importance for the complete understanding of mechanisms that govern the evolution and stability of the genome.
Project description:Homing endonucleases are site-specific DNA endonucleases that function as mobile genetic elements by introducing double-strand breaks or nicks at defined locations. Of the major families of homing endonucleases, the modular GIY-YIG endonucleases are least understood in terms of mechanism. The GIY-YIG homing endonuclease I-BmoI generates a double-strand break by sequential nicking reactions during which the single active site of the GIY-YIG nuclease domain must undergo a substantial reorganization. Here, we show that divalent metal ion plays a significant role in regulating the two independent nicking reactions by I-BmoI. Rate constant determination for each nicking reaction revealed that limiting divalent metal ion has a greater impact on the second strand than the first strand nicking reaction. We also show that substrate mutations within the I-BmoI cleavage site can modulate the first strand nicking reaction over a 314-fold range. Additionally, in-gel DNA footprinting with mutant substrates and modeling of an I-BmoI-substrate complex suggest that amino acid contacts to a critical GC-2 base pair are required to induce a bottom-strand distortion that likely directs conformational changes for reaction progress. Collectively, our data implies mechanistic roles for divalent metal ion and substrate bases, suggesting that divalent metal ion facilitates the re-positioning of the GIY-YIG nuclease domain between sequential nicking reactions.
Project description:Homing endonucleases typically contain one of four conserved catalytic motifs, and other elements that confer tight DNA binding. I-CreII, which catalyzes homing of the Cr.psbA4 intron, is unusual in containing two potential catalytic motifs, H-N-H and GIY-YIG. Previously, we showed that cleavage by I-CreII leaves ends (2-nt 3' overhangs) that are characteristic of GIY-YIG endonucleases, yet it has a relaxed metal requirement like H-N-H enzymes. Here we show that I-CreII can bind DNA without an added metal ion, and that it binds as a monomer, akin to GIY-YIG enzymes. Moreover, cleavage of supercoiled DNA, and estimates of strand-specific cleavage rates, suggest that I-CreII uses a sequential cleavage mechanism. Alanine substitution of a number of residues in the GIY-YIG motif, however, did not block cleavage activity, although DNA binding was substantially reduced in several variants. Substitution of conserved histidines in the H-N-H motif resulted in variants that did not promote DNA cleavage, but retained high-affinity DNA binding-thus identifying it as the catalytic motif. Unlike the non-specific H-N-H colicins, however; substitution of the conserved asparagine substantially reduced DNA binding (though not the ability to promote cleavage). These results indicate that, in I-CreII, two catalytic motifs have evolved to play important roles in specific DNA binding. The data also indicate that only the H-N-H motif has retained catalytic ability.
Project description:The LEM domain (for lamina-associated polypeptide, emerin, MAN1 domain) defines a group of nuclear proteins that bind chromatin through interaction of the LEM motif with the conserved DNA crosslinking protein, barrier-to-autointegration factor (BAF). Here, we describe a LEM protein annotated in databases as 'Ankyrin repeat and LEM domain-containing protein 1' (Ankle1). We show that Ankle1 is conserved in metazoans and contains a unique C-terminal GIY-YIG motif that confers endonuclease activity in vitro and in vivo. In mammals, Ankle1 is predominantly expressed in hematopoietic tissues. Although most characterized LEM proteins are components of the inner nuclear membrane, ectopic Ankle1 shuttles between cytoplasm and nucleus. Ankle1 enriched in the nucleoplasm induces DNA cleavage and DNA damage response. This activity requires both the catalytic C-terminal GIY-YIG domain and the LEM motif, which binds chromatin via BAF. Hence, Ankle1 is an unusual LEM protein with a GIY-YIG-type endonuclease activity in higher eukaryotes.
Project description:Phage T4 endonuclease II (EndoII), a GIY-YIG endonuclease lacking a carboxy-terminal DNA-binding domain, was subjected to site-directed mutagenesis to investigate roles of individual amino acids in substrate recognition, binding, and catalysis. The structure of EndoII was modeled on that of UvrC. We found catalytic roles for residues in the putative catalytic surface (G49, R57, E118, and N130) similar to those described for I-TevI and UvrC; in addition, these residues were found to be important for substrate recognition and binding. The conserved glycine (G49) and arginine (R57) were essential for normal sequence recognition. Our results are in agreement with a role for these residues in forming the DNA-binding surface and exposing the substrate scissile bond at the active site. The conserved asparagine (N130) and an adjacent proline (P127) likely contribute to positioning the catalytic domain correctly. Enzymes in the EndoII subfamily of GIY-YIG endonucleases share a strongly conserved middle region (MR, residues 72 to 93, likely helical and possibly substituting for heterologous helices in I-TevI and UvrC) and a less strongly conserved N-terminal region (residues 12 to 24). Most of the conserved residues in these two regions appeared to contribute to binding strength without affecting the mode of substrate binding at the catalytic surface. EndoII K76, part of a conserved NUMOD3 DNA-binding motif of homing endonucleases found to overlap the MR, affected both sequence recognition and catalysis, suggesting a more direct involvement in positioning the substrate. Our data thus suggest roles for the MR and residues conserved in GIY-YIG enzymes in recognizing and binding the substrate.
Project description:GIY-YIG homing endonucleases are modular proteins, with conserved N-terminal catalytic domains connected by linkers to C-terminal DNA-binding domains. I-TevI, the T4 phage GIY-YIG intron endonuclease, functions both in promoting td intron homing, and in acting as a transcriptional autorepressor. Repression is achieved by binding to an operator, which is cleaved at 100-fold reduced efficiency relative to the intronless homing site. The linker includes a zinc finger, which functions in distance determination, to constrain the catalytic domain to cleave the homing site at a fixed position. Here we show that I-BmoI, a related GIY-YIG endonuclease lacking a zinc finger, also possesses some cleavage distance discrimination. Furthermore, hybrid endonucleases constructed by swapping the domains of I-BmoI and I-TevI are active, precise and demonstrate that features other than the zinc finger facilitate distance determination. Most importantly, I-TevI zinc finger mutants cleave the operator more efficiently than the homing site, the converse of wild-type protein. These results are consistent with the zinc finger acting as a measuring device, directing efficient cleavage of the homing site to promote intron mobility, while reducing cleavage at the operator to ensure transcriptional autorepression and phage viability.