Reductive evolution of proteomes and protein structures.
ABSTRACT: The lengths of orthologous protein families in Eukarya are almost double the lengths found in Bacteria and Archaea. Here we examine protein structures in 745 genomes and show that protein length differences between superkingdoms arise as much shorter prokaryotic nondomain linker sequences. Eukaryotic, bacterial, and archaeal linkers are 250, 86, and 73 aa residues in length, respectively, whereas folded domain sequences are 281, 280, and 256 residues, respectively. Cryptic domains match linkers (P < 0.0001) with probabilities ranging between 0.022 and 0.042; accordingly, they do not affect length estimates significantly. Linker sequences support intermolecular binding within proteomes and they are probably enriched in intrinsically disordered regions as well. Reductively evolved linker sequence lengths in growth rate maximized cells should be proportional to proteome diversity. By using total in-frame coding capacity of a genome [i.e., coding sequence (CDS)] as a reliable measure of proteome diversity, we find linker lengths of prokaryotes clearly evolve in proportion to CDS values, whereas those of eukaryotes are more randomly larger than expected. Domain lengths scarcely change over the entire range of CDS values. Thus, the protein linkers of prokaryotes evolve reductively whereas those of eukaryotes do not.
Project description:Cellulase enzymes deconstruct cellulose to glucose, and are often comprised of glycosylated linkers connecting glycoside hydrolases (GHs) to carbohydrate-binding modules (CBMs). Although linker modifications can alter cellulase activity, the functional role of linkers beyond domain connectivity remains unknown. Here we investigate cellulase linkers connecting GH Family 6 or 7 catalytic domains to Family 1 or 2 CBMs, from both bacterial and eukaryotic cellulases to identify conserved characteristics potentially related to function. Sequence analysis suggests that the linker lengths between structured domains are optimized based on the GH domain and CBM type, such that linker length may be important for activity. Longer linkers are observed in eukaryotic GH Family 6 cellulases compared to GH Family 7 cellulases. Bacterial GH Family 6 cellulases are found with structured domains in either N to C terminal order, and similar linker lengths suggest there is no effect of domain order on length. O-glycosylation is uniformly distributed across linkers, suggesting that glycans are required along entire linker lengths for proteolysis protection and, as suggested by simulation, for extension. Sequence comparisons show that proline content for bacterial linkers is more than double that observed in eukaryotic linkers, but with fewer putative O-glycan sites, suggesting alternative methods for extension. Conversely, near linker termini where linkers connect to structured domains, O-glycosylation sites are observed less frequently, whereas glycines are more prevalent, suggesting the need for flexibility to achieve proper domain orientations. Putative N-glycosylation sites are quite rare in cellulase linkers, while an N-P motif, which strongly disfavors the attachment of N-glycans, is commonly observed. These results suggest that linkers exhibit features that are likely tailored for optimal function, despite possessing low sequence identity. This study suggests that cellulase linkers may exhibit function in enzyme action, and highlights the need for additional studies to elucidate cellulase linker functions.
Project description:Flexible polypeptide linkers composed of glycine and serine are important components of engineered multidomain proteins. We have previously shown that the conformational properties of Gly-Gly-Ser repeat linkers can be quantitatively understood by comparing experimentally determined Förster resonance energy transfer (FRET) efficiencies of ECFP-linker-EYFP proteins to theoretical FRET efficiencies calculated using wormlike chain and Gaussian chain models. Here we extend this analysis to include linkers with different glycine contents. We determined the FRET efficiencies of ECFP-linker-EYFP proteins with linkers ranging in length from 25 to 73 amino acids and with glycine contents of 33.3% (GSSGSS), 16.7% (GSSSSSS), and 0% (SSSSSSS). The FRET efficiency decreased with an increasing linker length and was overall lower for linkers with less glycine. Modeling the linkers using the WLC model revealed that the experimentally observed FRET efficiencies were consistent with persistence lengths of 4.5, 4.8, and 6.2 Å for the GSSGSS, GSSSSS, and SSSSSS linkers, respectively. The observed increase in linker stiffness with reduced glycine content is much less pronounced than that predicted by a classical model developed by Flory and co-workers. We discuss possible reasons for this discrepancy as well as implications for using the stiffer linkers to control the effective concentrations of connected domains in engineered multidomain proteins.
Project description:This paper reports dissociation constants and "effective molarities" (M(eff)) for the intramolecular binding of a ligand covalently attached to the surface of a protein by oligo(ethylene glycol) (EG(n)) linkers of different lengths (n = 0, 2, 5, 10, and 20) and compares these experimental values with theoretical estimates from polymer theory. As expected, the value of M(eff) is lowest when the linker is too short (n = 0) to allow the ligand to bind noncovalently at the active site of the protein without strain, is highest when the linker is the optimal length (n = 2) to allow such binding to occur, and decreases monotonically as the length increases past this optimal value (but only by a factor of approximately 8 from n = 2 to n = 20). These experimental results are not compatible with a model in which the single bonds of the linker are completely restricted when the ligand has bound noncovalently to the active site of the protein, but they are quantitatively compatible with a model that treats the linker as a random-coil polymer. Calorimetry revealed that enthalpic interactions between the linker and the protein are not important in determining the thermodynamics of the system. Taken together, these results suggest that the manifestation of the linker in the thermodynamics of binding is exclusively entropic. The values of M(eff) are, theoretically, intrinsic properties of the EG(n) linkers and can be used to predict the avidities of multivalent ligands with these linkers for multivalent proteins. The weak dependence of M(eff) on linker length suggests that multivalent ligands containing flexible linkers that are longer than the spacing between the binding sites of a multivalent protein will be effective in binding, and that the use of flexible linkers with lengths somewhat greater than the optimal distance between binding sites is a justifiable strategy for the design of multivalent ligands.
Project description:The linker histone (LH), an auxiliary protein that can bind to chromatin and interact with the linker DNA to form stem motifs, is a key element of chromatin compaction. By affecting the chromatin condensation level, it also plays an active role in gene expression. However, the presence and variable concentration of LH in chromatin fibers with different DNA linker lengths indicate that its folding and condensation are highly adaptable and dependent on the immediate nucleosome environment. Recent experimental studies revealed that the behavior of LH in mononucleosomes markedly differs from that in small nucleosome arrays, but the associated mechanism is unknown. Here we report a structural analysis of the behavior of LH in mononucleosomes and oligonucleosomes (2-6 nucleosomes) using mesoscale chromatin simulations. We show that the adapted stem configuration heavily depends on the strength of electrostatic interactions between LH and its parental DNA linkers, and that those interactions tend to be asymmetric in small oligonucleosome systems. Namely, LH in oligonucleosomes dominantly interacts with one DNA linker only, as opposed to mononucleosomes where LH has similar interactions with both linkers and forms a highly stable nucleosome stem. Although we show that the LH condensation depends sensitively on the electrostatic interactions with entering and exiting DNA linkers, other interactions, especially by nonparental cores and nonparental linkers, modulate the structural condensation by softening LH and thus making oligonucleosomes more flexible, in comparison to to mono- and dinucleosomes. We also find that the overall LH/chromatin interactions sensitively depend on the linker length because the linker length determines the maximal nucleosome stem length. For mononucleosomes with DNA linkers shorter than LH, LH condenses fully, while for DNA linkers comparable or longer than LH, the LH extension in mononucleosomes strongly follows the length of DNA linkers, unhampered by neighboring linker histones. Thus, LH is more condensed for mononucleosomes with short linkers, compared to oligonucleosomes, and its orientation is variable and highly environment-dependent. More generally, the work underscores the agility of LH whose folding dynamics critically controls genomic packaging and gene expression.
Project description:Inspired by natural multienzyme complexes, many types of artificial multienzyme complexes have recently been constructed. We previously constructed a self-assembled complex of a bacterial cytochrome P450 and its ferredoxin and ferredoxin reductase partners using heterotrimerization of proliferating cell nuclear antigen (PCNA) from Sulfolobus solfataricus. In this study, we inserted different peptide linkers between ferredoxin and the PCNA subunit, and examined the effect on activity of the self-assembled multienzyme complex. Although the activity was affected by the lengths of both the rigid poly-L-proline-rich linkers and the flexible Gly4-Ser repeating linkers, the poly-L-proline-rich linkers provided the greatest activity enhancement. The optimized poly-L-proline-rich linker enhanced the activity 1.9-fold compared with the GGGGSLVPRGSGGGGS linker used in the previously reported complex, while the Gly4-Ser repeating linkers, (G4S)n (n = 1-6), did not yield higher activity than the maximum activity by the optimized poly-L-proline linker. Both the rigidity/flexibility and length of the peptide linker were found to be important for enhancing the overall activity of the multienzyme complex.
Project description:It is now well accepted that at least some serotonin receptors exist in dimeric and oligmeric forms. The linking of receptor ligands has been shown to have potential in the development of selective agonists and antagonists for traditionally refractive receptors. Here we report the development of a dimeric version of the known 5-HT(2A)R antagonist, M-100907. Derivatives of M-100907 were synthesized to determine an appropriate site for the linker connection. Then, homodimers with polyether linkers of different lengths were functionally tested in a bioassay to determine the optimal linker length. Attachment at the catechol of M-100907 with linkers between 12 and 18 atoms in length proved to be optimal.
Project description:Supramolecular protein assemblies have garnered considerable interest due to their potential in diverse fields with unrivaled attainable functionalities and structural accuracy. Despite significant advances in protein assembly strategies, inserting long linkers with varied lengths and rigidity between assembling protein building blocks remains extremely difficult. Here we report a series of green fluorescent protein (GFP) oligomers, where protein building blocks were linked via two independent peptide strands. Assembling protein units for this two-peptide assembly were designed by flopped fusion of three self-assembling GFP fragments with two peptide linkers. Diverse flexible and rigid peptide linkers were successfully inserted into high-valent GFP oligomers. In addition, oligomers with one flexible linker and one rigid linker could also be fabricated, allowing more versatile linker rigidity control. Linker length could be varied from 10 amino acids (aa) even up to 76 aa, which is the longest among reported protein assembling peptide linkers. Discrete GFP oligomers containing diverse linkers with valencies between monomers to decamers were monodispersely purified by gel elution. Furthermore, various functional proteins could be multivalently fused to the present GFP oligomers. Binding assays, size exclusion chromatography, dynamic light scattering, circular dichroism, differential scanning calorimetry, and transmission electron microscopy suggested circular geometries of the GFP oligomers and showed distinct characteristics of GFP oligomers with length/rigidity varied linkers. Lastly, a surface binding study indicated that more spaced oligomeric binding modules offered more effective multivalent interactions than less spaced modules.
Project description:Multidomain proteins represent a broad spectrum of the protein landscape and are involved in various interactions. They could be considered as modular building blocks assembled in distinct fashion and connected by linkers of varying lengths and sequences. Due to their intrinsic flexibility, these linkers provide proteins a subtle way to modulate interactions and explore a wide range of conformational space. In the present study, we are seeking to understand the effect of the flexibility and dynamics of the linker involved in the STAM2 UIM-SH3 dual domain protein with respect to molecular recognition. We have engineered several constructs of UIM-SH3 with different length linkers or domain deletion. By means of SAXS and NMR experiments, we have shown that the modification of the linker modifies the flexibility and the dynamics of UIM-SH3. Indeed, the global tumbling of both the UIM and SH3 domain is different but not independent from each other while the length of the linker has an impact on the ps-ns time scale dynamics of the respective domains. Finally, the modification of the flexibility and dynamics of the linker has a drastic effect on the interaction of UIM-SH3 with Lys63-linked diubiquitin with a roughly eight-time weaker dissociation constant.
Project description:Flexible polymer linkers play an important role in various imaging and probing techniques that require surface immobilization, including atomic force microscopy (AFM). In AFM force spectroscopy, polymer linkers are necessary for the covalent attachment of molecules of interest to the AFM tip and the surface. The polymer linkers tether the molecules and provide their proper orientation in probing experiments. Additionally, the linkers separate specific interactions from nonspecific short-range adhesion and serve as a reference point for the quantitative analysis of single molecule probing events. In this report, we present our results on the synthesis and testing of a novel polymer linker and the identification of a number of potential applications for its use in AFM force spectroscopy experiments. The synthesis of the linker is based on the well-developed phosphoramidate (PA) chemistry that allows the routine synthesis of linkers with predetermined lengths and PA composition. These linkers are homogeneous in length and can be terminated with various functional groups. PA linkers with different functional groups were synthesized and tested in experimental systems utilizing different immobilization chemistries. We probed interactions between complementary DNA oligonucleotides; DNA and protein complexes formed by the site-specific binding protein SfiI; and interactions between amyloid peptide (A?42). The results of the AFM force spectroscopy experiments validated the feasibility of the proposed approach for the linker design and synthesis. Furthermore, the properties of the tether (length, functional groups) can be adjusted to meet the specific requirements for different force spectroscopy experiments and system characteristics, suggesting that it could be used for a large number of various applications.
Project description:Precise manipulations of complex genomes by zinc-finger nucleases (ZFNs) depend on site-specific DNA cleavage, which requires two ZFN subunits to bind to two target half-sites separated by a spacer of 6 base pairs (bp). ZFN subunits consist of a specific DNA-binding domain and a nonspecific cleavage domain, connected by a short inter-domain linker. In this study, we conducted a systematic analysis of 11 candidate-based linkers using episomal and chromosomal targets in two human cell lines. We achieved gene targeting in up to 20% of transfected cells and identified linker variants that enforce DNA cleavage at narrowly defined spacer lengths and linkers that expand the repertoire of potential target sites. For instance, a nine amino acid (aa) linker induced efficient gene conversion at chromosomal sites with 7- or 16-bp spacers, whereas 4-aa linkers had activity optima at 5- and 6-bp spacers. Notably, single aa substitutions in the 4-aa linker affected the ZFN activity significantly, and both gene conversion and ZFN-associated toxicity depended on the linker/spacer combination and the cell type. In summary, both sequence and length of the inter-domain linker determine ZFN activity and target-site specificity, and are therefore important parameters to account for when designing ZFNs for genome editing.