Precise manipulation of chromosomes in vivo enables genome-wide codon replacement.
ABSTRACT: We present genome engineering technologies that are capable of fundamentally reengineering genomes from the nucleotide to the megabase scale. We used multiplex automated genome engineering (MAGE) to site-specifically replace all 314 TAG stop codons with synonymous TAA codons in parallel across 32 Escherichia coli strains. This approach allowed us to measure individual recombination frequencies, confirm viability for each modification, and identify associated phenotypes. We developed hierarchical conjugative assembly genome engineering (CAGE) to merge these sets of codon modifications into genomes with 80 precise changes, which demonstrate that these synonymous codon substitutions can be combined into higher-order strains without synthetic lethal effects. Our methods treat the chromosome as both an editable and an evolvable template, permitting the exploration of vast genetic landscapes.
Project description:Nature uses 64 codons to encode the synthesis of proteins from the genome, and chooses 1 sense codon-out of up to 6 synonyms-to encode each amino acid. Synonymous codon choice has diverse and important roles, and many synonymous substitutions are detrimental. Here we demonstrate that the number of codons used to encode the canonical amino acids can be reduced, through the genome-wide substitution of target codons by defined synonyms. We create a variant of Escherichia coli with a four-megabase synthetic genome through a high-fidelity convergent total synthesis. Our synthetic genome implements a defined recoding and refactoring scheme-with simple corrections at just seven positions-to replace every known occurrence of two sense codons and a stop codon in the genome. Thus, we recode 18,214 codons to create an organism with a 61-codon genome; this organism uses 59 codons to encode the 20 amino acids, and enables the deletion of a previously essential transfer RNA.
Project description:Variation in synonymous codon usage is abundant across multiple levels of organization: between codons of an amino acid, between genes in a genome, and between genomes of different species. It is now well understood that variation in synonymous codon usage is influenced by mutational bias coupled with both natural selection for translational efficiency and genetic drift, but how these processes shape patterns of codon usage bias across entire lineages remains unexplored. To address this question, we used a rich genomic data set of 327 species that covers nearly one third of the known biodiversity of the budding yeast subphylum Saccharomycotina. We found that, while genome-wide relative synonymous codon usage (RSCU) for all codons was highly correlated with the GC content of the third codon position (GC3), the usage of codons for the amino acids proline, arginine, and glycine was inconsistent with the neutral expectation where mutational bias coupled with genetic drift drive codon usage. Examination between genes' effective numbers of codons and their GC3 contents in individual genomes revealed that nearly a quarter of genes (381,174/1,683,203; 23%), as well as most genomes (308/327; 94%), significantly deviate from the neutral expectation. Finally, by evaluating the imprint of translational selection on codon usage, measured as the degree to which genes' adaptiveness to the tRNA pool were correlated with selective pressure, we show that translational selection is widespread in budding yeast genomes (264/327; 81%). These results suggest that the contribution of translational selection and drift to patterns of synonymous codon usage across budding yeasts varies across codons, genes, and genomes; whereas drift is the primary driver of global codon usage across the subphylum, the codon bias of large numbers of genes in the majority of genomes is influenced by translational selection.
Project description:<h4>Background</h4>Codon bias is a phenomenon of non-uniform usage of codons whereas codon context generally refers to sequential pair of codons in a gene. Although genome sequencing of multiple species of dipteran and hymenopteran insects have been completed only a few of these species have been analyzed for codon usage bias.<h4>Methods and principal findings</h4>Here, we use bioinformatics approaches to analyze codon usage bias and codon context patterns in a genome-wide manner among 15 dipteran and 7 hymenopteran insect species. Results show that GAA is the most frequent codon in the dipteran species whereas GAG is the most frequent codon in the hymenopteran species. Data reveals that codons ending with C or G are frequently used in the dipteran genomes whereas codons ending with A or T are frequently used in the hymenopteran genomes. Synonymous codon usage orders (SCUO) vary within genomes in a pattern that seems to be distinct for each species. Based on comparison of 30 one-to-one orthologous genes among 17 species, the fruit fly Drosophila willistoni shows the least codon usage bias whereas the honey bee (Apis mellifera) shows the highest bias. Analysis of codon context patterns of these insects shows that specific codons are frequently used as the 3'- and 5'-context of start and stop codons, respectively.<h4>Conclusions</h4>Codon bias pattern is distinct between dipteran and hymenopteran insects. While codon bias is favored by high GC content of dipteran genomes, high AT content of genes favors biased usage of synonymous codons in the hymenopteran insects. Also, codon context patterns vary among these species largely according to their phylogeny.
Project description:The genetic code in mRNA is redundant, with 61 sense codons translated into 20 different amino acids. Individual amino acids are encoded by up to six different codons but within codon families some are used more frequently than others. This phenomenon is referred to as synonymous codon usage bias. The genomes of free-living unicellular organisms such as bacteria have an extreme codon usage bias and the degree of bias differs between genes within the same genome. The strong positive correlation between codon usage bias and gene expression levels in many microorganisms is attributed to selection for translational efficiency. However, this putative selective advantage has never been measured in bacteria and theoretical estimates vary widely. By systematically exchanging optimal codons for synonymous codons in the tuf genes we quantified the selective advantage of biased codon usage in highly expressed genes to be in the range 0.2-4.2 x 10-4 per codon per generation. These data quantify for the first time the potential for selection on synonymous codon choice to drive genome-wide sequence evolution in bacteria, and in particular to optimize the sequences of highly expressed genes. This quantification may have predictive applications in the design of synthetic genes and for heterologous gene expression in biotechnology.
Project description:Statistics measuring codon selection seek to compare genes by their sensitivity to selection for translational efficiency, but existing statistics lack a model for testing the significance of differences between genes. Here, we introduce a new statistic for measuring codon selection, the Adaptive Codon Enrichment (ACE).This statistic represents codon usage bias in terms of a probabilistic distribution, quantifying the extent that preferred codons are over-represented in the gene of interest relative to the mean and variance that would result from stochastic sampling of codons. Expected codon frequencies are derived from the observed codon usage frequencies of a broad set of genes, such that they are likely to reflect nonselective, genome wide influences on codon usage (e.g. mutational biases). The relative adaptiveness of synonymous codons is deduced from the frequency of codon usage in a pre-selected set of genes relative to the expected frequency. The ACE can predict both transcript abundance during rapid growth and the rate of synonymous substitutions, with accuracy comparable to or greater than existing metrics. We further examine how the composition of reference gene sets affects the accuracy of the statistic, and suggest methods for selecting appropriate reference sets for any genome, including bacteriophages. Finally, we demonstrate that the ACE may naturally be extended to quantify the genome-wide influence of codon selection in a manner that is sensitive to a large fraction of codons in the genome. This reveals substantial variation among genomes, correlated with the tRNA gene number, even among groups of bacteria where previously proposed whole-genome measures show little variation.The statistical framework of the ACE allows rigorous comparison of the level of codon selection acting on genes, both within a genome and between genomes.
Project description:Torque teno sus virus 1 (TTSuV1) is a novel virus that has been found widely distributed in the swine population in recent years. Analysis of codon usage can reveal much about the molecular evolution of TTSuV1. In this study, synonymous codon usage patterns and the key determinants in the coding region of 29 available complete TTSuV1 genome sequences were examined. By calculating the nucleotide content and relative synonymous codon usage (RSCU) of TTSuV1 coding sequences, we found that the preferentially used codons were mostly those ending with A or C nucleotides; less-used codons were mostly codons ending with U or G nucleotides, and these were mainly affected by composition constraints. Although there was a variation in codon usage bias among different TTSuV1 genomes, the codon usage bias and GC content in the TTSuV1 coding region was lower, which was mainly determined by the base composition in the third codon position and the effective number of codons (ENC) value. Moreover, the results of correspondence analysis (COA) indicated that the codon usage patterns of TTSuV1 isolated from different countries varied greatly and had significant differences. In addition, Spearman's rank correlation analysis and an ENC plot revealed that apart from mutation pressure, which was critical in determining the codon usage pattern, other factors were involved in shaping the evolution of codon usage bias in TTSuV1, such as natural selection. Those results suggested that synonymous codon usage patterns of TTSuV1 genomes were the result of interaction between mutation pressure and natural selection. The information from this study not only provides important insights into the synonymous codon usage pattern of TTSuV1, but also helps to identify the main factors affecting codon usage by this virus.
Project description:The evolution of bias in synonymous codon usage in chosen monkeypox viral genomes and the factors influencing its diversification have not been reported so far. In this study, various trends associated with synonymous codon usage in chosen monkeypox viral genomes were investigated, and the results are reported. Identification of factors that influence codon usage in chosen monkeypox viral genomes was done using various codon usage indices, such as the relative synonymous codon usage, the effective number of codons, and the codon adaptation index. The Spearman rank correlation analysis and a correspondence analysis were used for correlating various factors with codon usage. The results revealed that mutational pressure due to compositional constraints, gene expression level, and selection at the codon level for utilization of putative optimal codons are major factors influencing synonymous codon usage bias in monkeypox viral genomes. A cluster analysis of relative synonymous codon usage values revealed a grouping of more virulent strains as one major cluster (Central African strains) and a grouping of less virulent strains (West African strains) as another major cluster, indicating a relationship between virulence and synonymous codon usage bias. This study concluded that a balance between the mutational pressure acting at the base composition level and the selection pressure acting at the amino acid level frames synonymous codon usage bias in the chosen monkeypox viruses. The natural selection from the host does not seem to have influenced the synonymous codon usage bias in the analyzed monkeypox viral genomes.
Project description:Codon usage bias is an important evolutionary feature in a genome and has been widely documented in many genomes. Analysis of codon usage bias has significance for mRNA translation, design of transgenes, new gene discovery, and studies of molecular biology and evolution, etc. However, the information about synonymous codon usage pattern of T. saginata genome remains unclear. T. saginata is a food-borne zoonotic cestode which infects approximataely 50 million humans worldwide, and causes significant health problems to the host and considerable socio-economic losses as a consequence. In this study, synonymous codon usage in T. saginata were examined.Total RNA was isolated from T. saginata cysticerci and 91,487 unigenes were generated using Illumina sequencing technology. After filtering, the final sequence collection containing 11,399 CDSs was used for our analysis.Neutrality analysis showed that the T. saginata had a wide GC3 distribution and a significant correlation was observed between GC12 and GC3. NC-plot showed most of genes on or close to the expected curve, but only a few points with low-ENC values were below it, suggesting that mutational bias plays a major role in shaping codon usage. The Parity Rule 2 plot (PR2) analysis showed that GC and AT were not used proportionally. We also identified twenty-three optimal codons in the T. saginata genome, all of which were ended with a G or C residue. These results suggest that mutational and selection forces are probably driving factors of codon usage bias in T. saginata genome. Meanwhile, other factors such as protein length, gene expression, GC content of genes, the hydropathicity of each protein also influence codon usage.Here, we systematically analyzed the codon usage pattern and identified factors shaping in codon usage bias in T. saginata. Currently, no complete nuclear genome is available for codon usage analysis at the genome level in T. saginata. This is the first report to investigate codon biology in T. sagninata. Such information does not only bring about a new perspective for understanding the mechanisms of biased usage of synonymous codons but also provide useful clues for molecular genetic engineering and evolutionary studies.
Project description:Chikungunya virus (CHIKV) is an arthropod-borne virus of the family Togaviridae that is transmitted to humans by Aedes spp. mosquitoes. Its genome comprises a 12 kb single-strand positive-sense RNA. In the present study, we report the patterns of synonymous codon usage in 141 CHIKV genomes by calculating several codon usage indices and applying multivariate statistical methods. Relative synonymous codon usage (RSCU) analysis showed that the preferred synonymous codons were G/C and A-ended. A comparative analysis of RSCU between CHIKV and its hosts showed that codon usage patterns of CHIKV are a mixture of coincidence and antagonism. Similarity index analysis showed that the overall codon usage patterns of CHIKV have been strongly influenced by Pan troglodytes and Aedes albopictus during evolution. The overall codon usage bias was low in CHIKV genomes, as inferred from the analysis of effective number of codons (ENC) and codon adaptation index (CAI). Our data suggested that although mutation pressure dominates codon usage in CHIKV, patterns of codon usage in CHIKV are also under the influence of natural selection from its hosts and geography. To the best of our knowledge, this is first report describing codon usage analysis in CHIKV genomes. The findings from this study are expected to increase our understanding of factors involved in viral evolution, and fitness towards hosts and the environment.
Project description:The degeneracy of the genetic code allows nucleic acids to encode amino acid identity as well as noncoding information for gene regulation and genome maintenance. The rare arginine codons AGA and AGG (AGR) present a case study in codon choice, with AGRs encoding important transcriptional and translational properties distinct from the other synonymous alternatives (CGN). We created a strain of Escherichia coli with all 123 instances of AGR codons removed from all essential genes. We readily replaced 110 AGR codons with the synonymous CGU codons, but the remaining 13 "recalcitrant" AGRs required diversification to identify viable alternatives. Successful replacement codons tended to conserve local ribosomal binding site-like motifs and local mRNA secondary structure, sometimes at the expense of amino acid identity. Based on these observations, we empirically defined metrics for a multidimensional "safe replacement zone" (SRZ) within which alternative codons are more likely to be viable. To evaluate synonymous and nonsynonymous alternatives to essential AGRs further, we implemented a CRISPR/Cas9-based method to deplete a diversified population of a wild-type allele, allowing us to evaluate exhaustively the fitness impact of all 64 codon alternatives. Using this method, we confirmed the relevance of the SRZ by tracking codon fitness over time in 14 different genes, finding that codons that fall outside the SRZ are rapidly depleted from a growing population. Our unbiased and systematic strategy for identifying unpredicted design flaws in synthetic genomes and for elucidating rules governing codon choice will be crucial for designing genomes exhibiting radically altered genetic codes.