Structural and evolutionary classification of Type II restriction enzymes based on theoretical and experimental analyses.
ABSTRACT: For a very long time, Type II restriction enzymes (REases) have been a paradigm of ORFans: proteins with no detectable similarity to each other and to any other protein in the database, despite common cellular and biochemical function. Crystallographic analyses published until January 2008 provided high-resolution structures for only 28 of 1637 Type II REase sequences available in the Restriction Enzyme database (REBASE). Among these structures, all but two possess catalytic domains with the common PD-(D/E)XK nuclease fold. Two structures are unrelated to the others: R.BfiI exhibits the phospholipase D (PLD) fold, while R.PabI has a new fold termed 'half-pipe'. Thus far, bioinformatic studies supported by site-directed mutagenesis have extended the number of tentatively assigned REase folds to five (now including also GIY-YIG and HNH folds identified earlier in homing endonucleases) and provided structural predictions for dozens of REase sequences without experimentally solved structures. Here, we present a comprehensive study of all Type II REase sequences available in REBASE together with their homologs detectable in the nonredundant and environmental samples databases at the NCBI. We present the summary and critical evaluation of structural assignments and predictions reported earlier, new classification of all REase sequences into families, domain architecture analysis and new predictions of three-dimensional folds. Among 289 experimentally characterized (not putative) Type II REases, whose apparently full-length sequences are available in REBASE, we assign 199 (69%) to contain the PD-(D/E)XK domain. The HNH domain is the second most common, with 24 (8%) members. When putative REases are taken into account, the fraction of PD-(D/E)XK and HNH folds changes to 48% and 30%, respectively. Fifty-six characterized (and 521 predicted) REases remain unassigned to any of the five REase folds identified so far, and may exhibit new architectures. These enzymes are proposed as the most interesting targets for structure determination by high-resolution experimental methods. Our analysis provides the first comprehensive map of sequence-structure relationships among Type II REases and will help to focus the efforts of structural and functional genomics of this large and biotechnologically important class of enzymes.
Project description:BACKGROUND: Catalytic domains of Type II restriction endonucleases (REases) belong to a few unrelated three-dimensional folds. While the PD-(D/E)XK fold is most common among these enzymes, crystal structures have been also determined for single representatives of two other folds: PLD (R.BfiI) and half-pipe (R.PabI). Bioinformatics analyses supported by mutagenesis experiments suggested that some REases belong to the HNH fold (e.g. R.KpnI), and that a small group represented by R.Eco29kI belongs to the GIY-YIG fold. However, for a large fraction of REases with known sequences, the three-dimensional fold and the architecture of the active site remain unknown, mostly due to extreme sequence divergence that hampers detection of homology to enzymes with known folds. RESULTS: R.Hpy188I is a Type II REase with unknown structure. PSI-BLAST searches of the non-redundant protein sequence database reveal only 1 homolog (R.HpyF17I, with nearly identical amino acid sequence and the same DNA sequence specificity). Standard application of state-of-the-art protein fold-recognition methods failed to predict the relationship of R.Hpy188I to proteins with known structure or to other protein families. In order to increase the amount of evolutionary information in the multiple sequence alignment, we have expanded our sequence database searches to include sequences from metagenomics projects. This search resulted in identification of 23 further members of R.Hpy188I family, both from metagenomics and the non-redundant database. Moreover, fold-recognition analysis of the extended R.Hpy188I family revealed its relationship to the GIY-YIG domain and allowed for computational modeling of the R.Hpy188I structure. Analysis of the R.Hpy188I model in the light of sequence conservation among its homologs revealed an unusual variant of the active site, in which the typical Tyr residue of the YIG half-motif had been substituted by a Lys residue. Moreover, some of its homologs have the otherwise invariant Arg residue in a non-homologous position in sequence that nonetheless allows for spatial conservation of the guanidino group potentially involved in phosphate binding. CONCLUSION: The present study eliminates a significant "white spot" on the structural map of REases. It also provides important insight into sequence-structure-function relationships in the GIY-YIG nuclease superfamily. Our results reveal that in the case of proteins with no or few detectable homologs in the standard "non-redundant" database, it is useful to expand this database by adding the metagenomic sequences, which may provide evolutionary linkage to detect more remote homologs.
Project description:Intensive horizontal gene transfer may generate diversity and heterogeneity within the genus Gardnerella. Restriction-modification (R-M) systems and CRISPR-Cas are the principal defense tools against foreign DNA in bacteria. Nearly half of the tested Gardnerella spp. isolates harbored the CRISPR-Cas system. Several putative R-M systems of Gardnerella spp. strains were identified in the REBASE database. However, there was no experimental evidence for restriction endonuclease (REase) activity in the isolates. We showed that G. vaginalis strain ATCC 14018 contains the REase R.Gva14018I, which recognizes GGCC and most probably generates blunt ends on cleavage. Bioinformatics evidence and the activity of recombinant methyltransferase M.Gva14018I in vivo indicate that ATCC 14018 possesses a HaeIII-like R-M system. The truncated R.Gva14018I-4 lacking the C-terminal region was expressed in Escherichia coli and displayed wild-type REase specificity. Polyclonal antibodies against R.Gva14018I-4 detected the wild-type REase in the cell lysate of ATCC 14018. The cofactor requirements for activity and bioinformatics analysis indicated that R.Gva14018I belongs to the PD-(D/E)XK family of REases. The REase-like activity was observed in 5 of 31 tested Gardnerella spp. strains, although none of these matched the DNA digestion pattern of R.Gva14018I.
Project description:Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering.
Project description:BACKGROUND: Restriction-modification systems are a diverse class of enzymes. They are classified into four major types: I, II, III and IV. We have previously proposed the existence of a Thermus sp. enzyme family, which belongs to type II restriction endonucleases (REases), however, it features also some characteristics of types I and III. Members include related thermophilic endonucleases: TspGWI, TaqII, TspDTI, and Tth111II. RESULTS: Here we describe cloning, mutagenesis and analysis of the prototype TspGWI enzyme that recognises the 5'-ACGGA-3' site and cleaves 11/9 nt downstream. We cloned, expressed, and mutagenised the tspgwi gene and investigated the properties of its product, the bifunctional TspGWI restriction/modification enzyme. Since TspGWI does not cleave DNA completely, a cloning method was devised, based on amino acid sequencing of internal proteolytic fragments. The deduced amino acid sequence of the enzyme shares significant sequence similarity with another representative of the Thermus sp. family - TaqII. Interestingly, these enzymes recognise similar, yet different sequences in the DNA. Both enzymes cleave DNA at the same distance, but differ in their ability to cleave single sites and in the requirement of S-adenosylmethionine as an allosteric activator for cleavage. Both the restriction endonuclease (REase) and methyltransferase (MTase) activities of wild type (wt) TspGWI (either recombinant or isolated from Thermus sp.) are dependent on the presence of divalent cations. CONCLUSION: TspGWI is a bifunctional protein comprising a tandem arrangement of Type I-like domains; particularly noticeable is the central HsdM-like module comprising a helical domain and a highly conserved S-adenosylmethionine-binding/catalytic MTase domain, containing DPAVGTG and NPPY motifs. TspGWI also possesses an N-terminal PD-(D/E)XK nuclease domain related to the corresponding domains in HsdR subunits, but lacks the ATP-dependent translocase module of the HsdR subunit and the additional domains that are involved in subunit-subunit interactions in Type I systems. The MTase and REase activities of TspGWI are autonomous and can be uncoupled. Structurally and functionally, the TspGWI protomer appears to be a streamlined 'half' of a Type I enzyme.
Project description:Type II restriction endonucleases (REases) cleave double-stranded DNA at specific sites within or close to their recognition sequences. Shortly after their discovery in 1970, REases have become one of the primary tools in molecular biology. However, the list of available specificities of type II REases is relatively short despite the extensive search for them in natural sources and multiple attempts to artificially change their specificity. In this study, we examined the possibility of generating cleavage specificities of REases by swapping putative target recognition domains (TRDs) between the type IIB enzymes AloI, PpiI, and TstI. Our results demonstrate that individual TRDs recognize distinct parts of the bipartite DNA targets of these enzymes and are interchangeable. Based on these properties, we engineered a functional type IIB REase having previously undescribed DNA specificity. Our study suggests that the TRD-swapping approach may be used as a general technique for the generation of type II enzymes with predetermined specificities.
Project description:Thus far, identification of functionally important residues in Type II restriction endonucleases (REases) has been difficult using conventional methods. Even though known REase structures share a fold and marginally recognizable active site, the overall sequence similarities are statistically insignificant, unless compared among proteins that recognize identical or very similar sequences. Bsp6I is a Type II REase, which recognizes the palindromic DNA sequence 5'GCNGC and cleaves between the cytosine and the unspecified nucleotide in both strands, generating a double-strand break with 5'-protruding single nucleotides. There are no solved structures of REases that recognize similar DNA targets or generate cleavage products with similar characteristics. In straightforward comparisons, the Bsp6I sequence shows no significant similarity to REases with known structures. However, using a fold-recognition approach, we have identified a remote relationship between Bsp6I and the structure of PvuII. Starting from the sequence-structure alignment between Bsp6I and PvuII, we constructed a homology model of Bsp6I and used it to predict functionally significant regions in Bsp6I. The homology model was supported by site-directed mutagenesis of residues predicted to be important for dimerization, DNA binding and catalysis. Completing the picture of sequence-structure-function relationships in protein superfamilies becomes an essential task in the age of structural genomics and our study may serve as a paradigm for future analyses of superfamilies comprising strongly diverged members with little or no sequence similarity.
Project description:Type II restriction endonucleases (REases) are deoxyribonucleases that cleave DNA sequences with remarkable specificity. Type II REases are highly divergent in sequence as well as in topology, i.e. the connectivity of secondary structure elements. A widely held assumption is that a structural core of five beta-strands flanked by two alpha-helices is common to these enzymes. We introduce a systematic procedure to enumerate secondary structure elements in an unambiguous and reproducible way, and use it to analyze the currently available X-ray structures of Type II REases. Based on this analysis, we propose an alternative definition of the core, which we term the alphabetaalpha-core. The alphabetaalpha-core includes the most frequently observed secondary structure elements and is not a sandwich, as it consists of a five-strand beta-sheet and two alpha-helices on the same face of the beta-sheet. We use the alphabetaalpha-core connectivity as a basis for grouping the Type II REases into distinct structural classes. In these new structural classes, the connectivity correlates with the angles between the secondary structure elements and with the cleavage patterns of the REases. We show that there exists a substructure of the alphabetaalpha-core, namely a common conserved core, ccc, defined here as one alpha-helix and four beta-strands common to all Type II REase of known structure.
Project description:Restriction endonucleases (REases) recognize and cleave short palindromic DNA sequences, protecting bacterial cells against bacteriophage infection by attacking foreign DNA. We are interested in the potential of folded RNA to mimic DNA, a concept that might be applied to inhibition of DNA-binding proteins. As a model system, we sought RNA aptamers against the REases BamHI, PacI and KpnI using systematic evolution of ligands by exponential enrichment (SELEX). After 20 rounds of selection under different stringent conditions, we identified the 10 most enriched RNA aptamers for each REase. Aptamers were screened for binding and specificity, and assayed for REase inhibition. We obtained eight high-affinity (Kd ?12-30 nM) selective competitive inhibitors (IC50 ?20-150 nM) for KpnI. Predicted RNA secondary structures were confirmed by in-line attack assay and a 38-nt derivative of the best anti-KpnI aptamer was sufficient for inhibition. These competitive inhibitors presumably act as KpnI binding site analogs, but lack the primary consensus KpnI cleavage sequence and are not cleaved by KpnI, making their potential mode of DNA mimicry fascinating. Anti-REase RNA aptamers could have value in studies of REase mechanism and may give clues to a code for designing RNAs that competitively inhibit DNA binding proteins including transcription factors.
Project description:The restriction endonuclease (REase) R.KpnI is an orthodox Type IIP enzyme, which binds to DNA in the absence of metal ions and cleaves the DNA sequence 5'-GGTAC--C-3' in the presence of Mg2+ as shown generating 3' four base overhangs. Bioinformatics analysis reveals that R.KpnI contains a betabetaalpha-Me-finger fold, which is characteristic of many HNH-superfamily endonucleases, including homing endonuclease I-HmuI, structure-specific T4 endonuclease VII, colicin E9, sequence non-specific Serratia nuclease and sequence-specific homing endonuclease I-PpoI. According to our homology model of R.KpnI, D148, H149 and Q175 correspond to the critical D, H and N or H residues of the HNH nucleases. Substitutions of these three conserved residues lead to the loss of the DNA cleavage activity by R.KpnI, confirming their importance. The mutant Q175E fails to bind DNA at the standard conditions, although the DNA binding and cleavage can be rescued at pH 6.0, indicating a role for Q175 in DNA binding and cleavage. Our study provides the first experimental evidence for a Type IIP REase that does not belong to the PD...D/EXK superfamily of nucleases, instead is a member of the HNH superfamily.
Project description:The homing endonuclease I-Ssp6803I causes the insertion of a group I intron into a bacterial tRNA gene-the only example of an invasive mobile intron within a bacterial genome. Using a computational fold prediction, mutagenic screen and crystal structure determination, we demonstrate that this protein is a tetrameric PD-(D/E)-XK endonuclease - a fold normally used to protect a bacterial genome from invading DNA through the action of restriction endonucleases. I-Ssp6803I uses its tetrameric assembly to promote recognition of a single long target site, whereas restriction endonuclease tetramers facilitate cooperative binding and cleavage of two short sites. The limited use of the PD-(D/E)-XK nucleases by mobile introns stands in contrast to their frequent use of LAGLIDADG and HNH endonucleases - which in turn, are rarely incorporated into restriction/modification systems.