Identification of GATC- and CCGG-recognizing Type II REases and their putative specificity-determining positions using Scan2S--a novel motif scan algorithm with optional secondary structure constraints.
ABSTRACT: Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering.
Project description:Restriction endonucleases (REases) protect bacteria from invading foreign DNAs and are endowed with exquisite sequence specificity. REases have originated from the ancestral proteins and evolved new sequence specificities by genetic recombination, gene duplication, replication slippage, and transpositional events. They are also speculated to have evolved from nonspecific endonucleases, attaining a high degree of sequence specificity through point mutations. We describe here an example of generation of exquisitely site-specific REase from a highly-promiscuous one by a single point mutation.
Project description:Restriction endonucleases (REases) recognize and cleave short palindromic DNA sequences, protecting bacterial cells against bacteriophage infection by attacking foreign DNA. We are interested in the potential of folded RNA to mimic DNA, a concept that might be applied to inhibition of DNA-binding proteins. As a model system, we sought RNA aptamers against the REases BamHI, PacI and KpnI using systematic evolution of ligands by exponential enrichment (SELEX). After 20 rounds of selection under different stringent conditions, we identified the 10 most enriched RNA aptamers for each REase. Aptamers were screened for binding and specificity, and assayed for REase inhibition. We obtained eight high-affinity (Kd ?12-30 nM) selective competitive inhibitors (IC50 ?20-150 nM) for KpnI. Predicted RNA secondary structures were confirmed by in-line attack assay and a 38-nt derivative of the best anti-KpnI aptamer was sufficient for inhibition. These competitive inhibitors presumably act as KpnI binding site analogs, but lack the primary consensus KpnI cleavage sequence and are not cleaved by KpnI, making their potential mode of DNA mimicry fascinating. Anti-REase RNA aptamers could have value in studies of REase mechanism and may give clues to a code for designing RNAs that competitively inhibit DNA binding proteins including transcription factors.
Project description:For a very long time, Type II restriction enzymes (REases) have been a paradigm of ORFans: proteins with no detectable similarity to each other and to any other protein in the database, despite common cellular and biochemical function. Crystallographic analyses published until January 2008 provided high-resolution structures for only 28 of 1637 Type II REase sequences available in the Restriction Enzyme database (REBASE). Among these structures, all but two possess catalytic domains with the common PD-(D/E)XK nuclease fold. Two structures are unrelated to the others: R.BfiI exhibits the phospholipase D (PLD) fold, while R.PabI has a new fold termed 'half-pipe'. Thus far, bioinformatic studies supported by site-directed mutagenesis have extended the number of tentatively assigned REase folds to five (now including also GIY-YIG and HNH folds identified earlier in homing endonucleases) and provided structural predictions for dozens of REase sequences without experimentally solved structures. Here, we present a comprehensive study of all Type II REase sequences available in REBASE together with their homologs detectable in the nonredundant and environmental samples databases at the NCBI. We present the summary and critical evaluation of structural assignments and predictions reported earlier, new classification of all REase sequences into families, domain architecture analysis and new predictions of three-dimensional folds. Among 289 experimentally characterized (not putative) Type II REases, whose apparently full-length sequences are available in REBASE, we assign 199 (69%) to contain the PD-(D/E)XK domain. The HNH domain is the second most common, with 24 (8%) members. When putative REases are taken into account, the fraction of PD-(D/E)XK and HNH folds changes to 48% and 30%, respectively. Fifty-six characterized (and 521 predicted) REases remain unassigned to any of the five REase folds identified so far, and may exhibit new architectures. These enzymes are proposed as the most interesting targets for structure determination by high-resolution experimental methods. Our analysis provides the first comprehensive map of sequence-structure relationships among Type II REases and will help to focus the efforts of structural and functional genomics of this large and biotechnologically important class of enzymes.
Project description:Escherichia coli DNA adenine methyltransferase (EcoDam) methylates the N-6 position of the adenine in the sequence 5'-GATC-3' and plays vital roles in gene regulation, mismatch repair, and DNA replication. It remains unclear how the small number of critical GATC sites involved in the regulation of replication and gene expression are differentially methylated, whereas the approximately 20,000 GATCs important for mismatch repair and dispersed throughout the genome are extensively methylated. Our prior work, limited to the pap regulon, showed that methylation efficiency is controlled by sequences immediately flanking the GATC sites. We extend these studies to include GATC sites involved in diverse gene regulatory and DNA replication pathways as well as sites previously shown to undergo differential in vivo methylation but whose function remains to be assigned. EcoDam shows no change in affinity with variations in flanking sequences derived from these sources, but methylation kinetics varied 12-fold. A-tracts immediately adjacent to the GATC site contribute significantly to these differences in methylation kinetics. Interestingly, only when the poly(A) is located 5' of the GATC are the changes in methylation kinetics revealed. Preferential methylation is obscured when two GATC sites are positioned on the same DNA molecule, unless both sites are surrounded by large amounts of nonspecific DNA. Thus, facilitated diffusion and sequences immediately flanking target sites contribute to higher order specificity for EcoDam; we suggest that the diverse biological roles of the enzyme are in part regulated by these two factors, which may be important for other enzymes that sequence-specifically modify DNA.
Project description:BACKGROUND: We previously defined a family of restriction endonucleases (REases) from Thermus sp., which share common biochemical and biophysical features, such as the fusion of both the nuclease and methyltransferase (MTase) activities in a single polypeptide, cleavage at a distance from the recognition site, large molecular size, modulation of activity by S-adenosylmethionine (SAM), and incomplete cleavage of the substrate DNA. Members include related thermophilic REases with five distinct specificities: TspGWI, TaqII, Tth111II/TthHB27I, TspDTI and TsoI. RESULTS: TspDTI, TsoI and isoschizomers Tth111II/TthHB27I recognize different, but related sequences: 5'-ATGAA-3', 5'-TARCCA-3' and 5'-CAARCA-3' respectively. Their amino acid sequences are similar, which is unusual among REases of different specificity. To gain insight into this group of REases, TspDTI, the prototype member of the Thermus sp. enzyme family, was cloned and characterized using a recently developed method for partially cleaving REases. CONCLUSIONS: TspDTI, TsoI and isoschizomers Tth111II/TthHB27I are closely related bifunctional enzymes. They comprise a tandem arrangement of Type I-like domains, like other Type IIC enzymes (those with a fusion of a REase and MTase domains), e.g. TspGWI, TaqII and MmeI, but their sequences are only remotely similar to these previously characterized enzymes. The characterization of TspDTI, a prototype member of this group, extends our understanding of sequence-function relationships among multifunctional restriction-modification enzymes.
Project description:Type II restriction endonucleases (REases) cleave double-stranded DNA at specific sites within or close to their recognition sequences. Shortly after their discovery in 1970, REases have become one of the primary tools in molecular biology. However, the list of available specificities of type II REases is relatively short despite the extensive search for them in natural sources and multiple attempts to artificially change their specificity. In this study, we examined the possibility of generating cleavage specificities of REases by swapping putative target recognition domains (TRDs) between the type IIB enzymes AloI, PpiI, and TstI. Our results demonstrate that individual TRDs recognize distinct parts of the bipartite DNA targets of these enzymes and are interchangeable. Based on these properties, we engineered a functional type IIB REase having previously undescribed DNA specificity. Our study suggests that the TRD-swapping approach may be used as a general technique for the generation of type II enzymes with predetermined specificities.
Project description:Thus far, identification of functionally important residues in Type II restriction endonucleases (REases) has been difficult using conventional methods. Even though known REase structures share a fold and marginally recognizable active site, the overall sequence similarities are statistically insignificant, unless compared among proteins that recognize identical or very similar sequences. Bsp6I is a Type II REase, which recognizes the palindromic DNA sequence 5'GCNGC and cleaves between the cytosine and the unspecified nucleotide in both strands, generating a double-strand break with 5'-protruding single nucleotides. There are no solved structures of REases that recognize similar DNA targets or generate cleavage products with similar characteristics. In straightforward comparisons, the Bsp6I sequence shows no significant similarity to REases with known structures. However, using a fold-recognition approach, we have identified a remote relationship between Bsp6I and the structure of PvuII. Starting from the sequence-structure alignment between Bsp6I and PvuII, we constructed a homology model of Bsp6I and used it to predict functionally significant regions in Bsp6I. The homology model was supported by site-directed mutagenesis of residues predicted to be important for dimerization, DNA binding and catalysis. Completing the picture of sequence-structure-function relationships in protein superfamilies becomes an essential task in the age of structural genomics and our study may serve as a paradigm for future analyses of superfamilies comprising strongly diverged members with little or no sequence similarity.
Project description:Type II restriction endonucleases (REases) are deoxyribonucleases that cleave DNA sequences with remarkable specificity. Type II REases are highly divergent in sequence as well as in topology, i.e. the connectivity of secondary structure elements. A widely held assumption is that a structural core of five beta-strands flanked by two alpha-helices is common to these enzymes. We introduce a systematic procedure to enumerate secondary structure elements in an unambiguous and reproducible way, and use it to analyze the currently available X-ray structures of Type II REases. Based on this analysis, we propose an alternative definition of the core, which we term the alphabetaalpha-core. The alphabetaalpha-core includes the most frequently observed secondary structure elements and is not a sandwich, as it consists of a five-strand beta-sheet and two alpha-helices on the same face of the beta-sheet. We use the alphabetaalpha-core connectivity as a basis for grouping the Type II REases into distinct structural classes. In these new structural classes, the connectivity correlates with the angles between the secondary structure elements and with the cleavage patterns of the REases. We show that there exists a substructure of the alphabetaalpha-core, namely a common conserved core, ccc, defined here as one alpha-helix and four beta-strands common to all Type II REase of known structure.
Project description:BACKGROUND: Restriction-modification (RM) systems appear to play key roles in modulating gene flow among bacteria and archaea. Because the restriction endonuclease (REase) is potentially lethal to unmethylated new host cells, regulation to ensure pre-expression of the protective DNA methyltransferase (MTase) is essential to the spread of RM genes. This is particularly true for Type IIP RM systems, in which the REase and MTase are separate, independently-active proteins. A substantial subset of Type IIP RM systems are controlled by an activator-repressor called C protein. In these systems, C controls the promoter for its own gene, and for the downstream REase gene that lacks its own promoter. Thus MTase is expressed immediately after the RM genes enter a new cell, while expression of REase is delayed until sufficient C protein accumulates. To study the variation in and evolution of this regulatory mechanism, we searched for RM systems closely related to the well-studied C protein-dependent PvuII RM system. Unexpectedly, among those found were several in which the C protein and REase genes were fused. RESULTS: The gene for CR.NsoJS138I fusion protein (nsoJS138ICR, from the bacterium Niabella soli) was cloned, and the fusion protein produced and partially purified. Western blots provided no evidence that, under the conditions tested, anything other than full-length fusion protein is produced. This protein had REase activity in vitro and, as expected from the sequence similarity, its specificity was indistinguishable from that for PvuII REase, though the optimal reaction conditions were different. Furthermore, the fusion was active as a C protein, as revealed by in vivo activation of a lacZ reporter fusion to the promoter region for the nsoJS138ICR gene. CONCLUSIONS: Fusions between C proteins and REases have not previously been characterized, though other fusions have (such as between REases and MTases). These results reinforce the evidence for impressive modularity among RM system proteins, and raise important questions about the implications of the C-REase fusions on expression kinetics of these RM systems.
Project description:DNA adenine methyltransferase (Dam) is widespread and conserved among the ?-proteobacteria. Methylation of the Ade in GATC sequences regulates diverse bacterial cell functions, including gene expression, mismatch repair and chromosome replication. Dam also controls virulence in many pathogenic Gram-negative bacteria. An unexplained and perplexing observation about Escherichia coli Dam (EcoDam) is that there is no obvious relationship between the genes that are transcriptionally responsive to Dam and the promoter-proximal presence of GATC sequences. Here, we demonstrate that EcoDam interacts with a 5-base pair non-cognate sequence distinct from GATC. The crystal structure of a non-cognate complex allowed us to identify a DNA binding element, GTYTA/TARAC (where Y = C/T and R = A/G). This element immediately flanks GATC sites in some Dam-regulated promoters, including the Pap operon which specifies pyelonephritis-associated pili. In addition, Dam interacts with near-cognate GATC sequences (i.e. 3/4-site ATC and GAT). Taken together, these results imply that Dam, in addition to being responsible for GATC methylation, could also function as a methylation-independent transcriptional repressor.