Selection for translation efficiency on synonymous polymorphisms in recent human evolution.
ABSTRACT: Synonymous mutations are considered to be "silent" as they do not affect protein sequence. However, different silent codons have different translation efficiency (TE), which raises the question to what extent such mutations are really neutral. We perform the first genome-wide study of natural selection operating on TE in recent human evolution, surveying 13,798 synonymous single nucleotide polymorphisms (SNPs) in 1,198 unrelated individuals from 11 populations. We find evidence for both negative and positive selection on TE, as measured based on differentiation in allele frequencies between populations. Notably, the likelihood of an SNP to be targeted by positive or negative selection is correlated with the magnitude of its effect on the TE of the corresponding protein. Furthermore, negative selection acting against changes in TE is more marked in highly expressed genes, highly interacting proteins, complex members, and regulatory genes. It is also more common in functional regions and in the initial segments of highly expressed genes. Positive selection targeting sites with a large effect on TE is stronger in lowly interacting proteins and in regulatory genes. Similarly, essential genes are enriched for negative TE selection while underrepresented for positive TE selection. Taken together, these results point to the significant role of TE as a selective force operating in humans and hence underscore the importance of considering silent SNPs in interpreting associations with complex human diseases. Testifying to this potential, we describe two synonymous SNPs that may have clinical implications in phenylketonuria and in Best's macular dystrophy due to TE differences between alleles.
Project description:Detection of adaptive amino acid changes in proteins under recent short-term selection is of great interest for researchers studying microevolutionary processes in microbial pathogens or any other biological species. However, independent occurrence of such point mutations within genetically diverse haplotypes makes it difficult to detect the selection footprint by using traditional molecular evolutionary analyses. The recently developed Zonal Phylogeny (ZP) has been shown to be a useful analytic tool for identifying the footprints of short-term positive selection. ZP separates protein-encoding genes into evolutionarily long-term (with silent diversity) and short-term (without silent diversity) categories, or zones, followed by statistical analysis to detect signs of positive selection in the short-term zone. However, successful broad application of ZP for analysis of large haplotype datasets requires automation of the relatively labor-intensive computational process.Here we present Zonal Phylogeny Software (ZPS), an application that describes the distribution of single nucleotide polymorphisms (SNPs) of synonymous (silent) and non-synonymous (replacement) nature along branches of the DNA tree for any given protein-coding gene locus. Based on this information, ZPS separates the protein variant haplotypes with silent variability (Primary zone) from those that have recently evolved from the Primary zone variants by amino acid changes (External zone). Further comparative analysis of mutational hot-spot frequencies and haplotype diversity between the two zones allows determination of whether the External zone haplotypes emerged under positive selection.As a visualization tool, ZPS depicts the protein tree in a DNA tree, indicating the most parsimonious numbers of synonymous and non-synonymous changes along the branches of a maximum-likelihood based DNA tree, along with information on homoplasy, reversion and structural mutation hot-spots. Through zonal differentiation, ZPS allows detection of recent adaptive evolution via selection of advantageous structural mutations, even when the advantage conferred by such mutations is relatively short-term (as in the case of "source-sink" evolutionary dynamics, which may represent a major mode of virulence evolution in microbes).
Project description:Recent results from Drosophila suggest that positive selection has a substantial impact on genomic patterns of polymorphism and divergence. However, species with smaller population sizes and/or stronger population structure may not be expected to exhibit Drosophila-like patterns of sequence variation. We test this prediction and identify determinants of levels of polymorphism and rates of protein evolution using genomic data from Arabidopsis thaliana and the recently sequenced Arabidopsis lyrata genome. We find that, in contrast to Drosophila, there is no negative relationship between nonsynonymous divergence and silent polymorphism at any spatial scale examined. Instead, synonymous divergence is a major predictor of silent polymorphism, which suggests variation in mutation rate as the main determinant of silent variation. Variation in rates of protein divergence is mainly correlated with gene expression level and breadth, consistent with results for a broad range of taxa, and map-based estimates of recombination rate are only weakly correlated with nonsynonymous divergence. Variation in mutation rates and the strength of purifying selection seem to be major drivers of patterns of polymorphism and divergence in Arabidopsis. Nevertheless, a model allowing for varying negative and positive selection by functional gene category explains the data better than a homogeneous model, implying the action of positive selection on a subset of genes. Genes involved in disease resistance and abiotic stress display high proportions of adaptive substitution. Our results are important for a general understanding of the determinants of rates of protein evolution and the impact of selection on patterns of polymorphism and divergence.
Project description:Patterns of codon usage and "silent" DNA divergence suggest that natural selection discriminates among synonymous codons in Drosophila. "Preferred" codons are consistently found in higher frequencies within their synonymous families in Drosophila melanogaster genes. This suggests a simple model of silent DNA evolution where natural selection favors mutations from unpreferred to preferred codons (preferred changes). Changes in the opposite direction, from preferred to unpreferred synonymous codons (unpreferred changes), are selected against. Here, selection on synonymous DNA mutations is investigated by comparing the evolutionary dynamics of these two categories of silent DNA changes. Sequences from outgroups are used to determine the direction of synonymous DNA changes within and between D. melanogaster and Drosophila simulans for five genes. Population genetics theory shows that differences in the fitness effect of mutations can be inferred from the comparison of ratios of polymorphism to divergence. Unpreferred changes show a significantly higher ratio of polymorphism to divergence than preferred changes in the D. simulans lineage, confirming the action of selection at silent sites. An excess of unpreferred fixations in 28 genes suggests a relaxation of selection on synonymous mutations in D. melanogaster. Estimates of selection coefficients for synonymous mutations (3.6 < magnitude of Nes < 1.3) in D. simulans are consistent with the reduced efficacy of natural selection (magnitude of Nes < 1) in the three- to sixfold smaller effective population size of D. melanogaster. Synonymous DNA changes appear to be a prevalent class of weakly selected mutations in Drosophila.
Project description:<h4>Background</h4>Nonsynonymous mutations change the protein sequences and are frequently subjected to natural selection. The same goes for nonsense mutations that introduce pre-mature stop codons into CDSs (coding sequences). Synonymous mutations, however, are intuitively thought to be functionally silent and evolutionarily neutral. Now researchers know that the optimized synonymous codon usage is advantageous in the speedy mRNA translation process. With the advent of NGS technique, the explosion of NGS data generated from the tumor tissues help researchers identify driver mutations in cancer-related genes, but relatively less attention is paid to the SNP data in healthy human populations when studying cancer.<h4>Methods</h4>Here, we analyzed the publically available human SNPs. We classified these SNPs according to their functional and evolutionary categories. By simply dividing the human genes into cancer-related genes and other genes, we compared the features of nonsynonymous, synonymous and nonsense mutations in these two gene sets from multiple aspects.<h4>Results</h4>We provided lines of evidence that the nonsynonymous, synonymous and nonsense mutations in cancer-related genes undergo stronger purifying selection when compared to the expected pattern in other genes. The lower nonsynonymous to synonymous ratio observed in cancer-related genes suggests the suppression of amino acid substitutions in these genes. The synonymous SNPs, after excluding those in splicing regions, exhibit preferred changes in codon usage and higher codon frequencies in cancer-related genes compared to other genes, indicating the constraint exerted on these mutations. Nonsense mutations are less frequent and located closer to stop codons in cancer-related genes than in other genes, which putatively minimize their deleterious effects.<h4>Conclusion</h4>Our study demonstrated the evolutionary constraint on mutations in CDS of cancer-related genes without the requirement of data from cancer tissues or patients. Our work provides novel perspectives on interpreting the constraint on mutations in cancer-related genes. We reveal extra constraint on synonymous mutations in cancer-related genes which is related to codon usage bias and is in addition to the splicing effect.
Project description:Evolution of protein sequences is largely governed by purifying selection, with a small fraction of proteins evolving under positive selection. The evolution at synonymous positions in protein-coding genes is not nearly as well understood, with the extent and types of selection remaining, largely, unclear. A statistical test to identify purifying and positive selection at synonymous sites in protein-coding genes was developed. The method compares the rate of evolution at synonymous sites (Ks) to that in intron sequences of the same gene after sampling the aligned intron sequences to mimic the statistical properties of coding sequences. We detected purifying selection at synonymous sites in approximately 28% of the 1,562 analyzed orthologous genes from mouse and rat, and positive selection in approximately 12% of the genes. Thus, the fraction of genes with readily detectable positive selection at synonymous sites is much greater than the fraction of genes with comparable positive selection at nonsynonymous sites, i.e., at the level of the protein sequence. Unlike other genes, the genes with positive selection at synonymous sites showed no correlation between Ks and the rate of evolution in nonsynonymous sites (Ka), indicating that evolution of synonymous sites under positive selection is decoupled from protein evolution. The genes with purifying selection at synonymous sites showed significant anticorrelation between Ks and expression level and breadth, indicating that highly expressed genes evolve slowly. The genes with positive selection at synonymous sites showed the opposite trend, i.e., highly expressed genes had, on average, higher Ks. For the genes with positive selection at synonymous sites, a significantly lower mRNA stability is predicted compared to the genes with negative selection. Thus, mRNA destabilization could be an important factor driving positive selection in nonsynonymous sites, probably, through regulation of expression at the level of mRNA degradation and, possibly, also translation rate. So, unexpectedly, we found that positive selection at synonymous sites of mammalian genes is substantially more common than positive selection at the level of protein sequences. Positive selection at synonymous sites might act through mRNA destabilization affecting mRNA levels and translation.
Project description:Sliding-window analysis has widely been used to uncover synonymous (silent, d(S)) and nonsynonymous (replacement, d(N)) rate variation along the protein sequence and to detect regions of a protein under selective constraint (indicated by d(N)<d(S)) or positive selection (indicated by d(N)>d(S)). The approach compares two or more protein-coding genes and plots estimates d(/\)(S) and d(/\)(N) from each sliding window along the sequence. Here we demonstrate that the approach produces artifactual trends of synonymous and nonsynonymous rate variation, with greater variation in d(/\)(S) than in d(/\)(N). Such trends are generated even if the true d(S) and d(N) are constant along the whole protein and different codons are evolving independently. Many published tests of negative and positive selection using sliding windows that we have examined appear to be invalid because they fail to correct for multiple testing. Instead, likelihood ratio tests provide a more rigorous framework for detecting signals of natural selection affecting protein evolution. We demonstrate that a previous finding that a particular region of the BRCA1 gene experienced a synonymous rate reduction driven by purifying selection is likely an artifact of the sliding window analysis. We evaluate various sliding-window analyses in molecular evolution, population genetics, and comparative genomics, and argue that the approach is not generally valid if it is not known a priori that a trend exists and if no correction for multiple testing is applied.
Project description:BACKGROUND: Synonymous DNA substitution rates in the plant chloroplast genome are generally relatively slow and lineage dependent. Non-synonymous rates are usually even slower due to purifying selection acting on the genes. Positive selection is expected to speed up non-synonymous substitution rates, whereas synonymous rates are expected to be unaffected. Until recently, positive selection has seldom been observed in chloroplast genes, and large-scale structural rearrangements leading to gene duplications are hitherto supposed to be rare. METHODOLOGY/PRINCIPLE FINDINGS: We found high substitution rates in the exons of the plastid clpP1 gene in Oenothera (the Evening Primrose family) and three separate lineages in the tribe Sileneae (Caryophyllaceae, the Carnation family). Introns have been lost in some of the lineages, but where present, the intron sequences have substitution rates similar to those found in other introns of their genomes. The elevated substitution rates of clpP1 are associated with statistically significant whole-gene positive selection in three branches of the phylogeny. In two of the lineages we found multiple copies of the gene. Neighboring genes present in the duplicated fragments do not show signs of elevated substitution rates or positive selection. Although non-synonymous substitutions account for most of the increase in substitution rates, synonymous rates are also markedly elevated in some lineages. Whereas plant clpP1 genes experiencing negative (purifying) selection are characterized by having very conserved lengths, genes under positive selection often have large insertions of more or less repetitive amino acid sequence motifs. CONCLUSIONS/SIGNIFICANCE: We found positive selection of the clpP1 gene in various plant lineages to correlated with repeated duplication of the clpP1 gene and surrounding regions, repetitive amino acid sequences, and increase in synonymous substitution rates. The present study sheds light on the controversial issue of whether negative or positive selection is to be expected after gene duplications by providing evidence for the latter alternative. The observed increase in synonymous substitution rates in some of the lineages indicates that the detection of positive selection may be obscured under such circumstances. Future studies are required to explore the functional significance of the large inserted repeated amino acid motifs, as well as the possibility that synonymous substitution rates may be affected by positive selection.
Project description:The joint inference of selection and past demography remain a costly and demanding task. We used next generation sequencing of two pools of 48 Norway spruce mother trees, one corresponding to the Fennoscandian domain, and the other to the Alpine domain, to assess nucleotide polymorphism at 88 nuclear genes. These genes are candidate genes for phenological traits, and most belong to the photoperiod pathway. Estimates of population genetic summary statistics from the pooled data are similar to previous estimates, suggesting that pooled sequencing is reliable. The nonsynonymous SNPs tended to have both lower frequency differences and lower FST values between the two domains than silent ones. These results suggest the presence of purifying selection. The divergence between the two domains based on synonymous changes was around 5 million yr, a time similar to a recent phylogenetic estimate of 6 million yr, but much larger than earlier estimates based on isozymes. Two approaches, one of them novel and that considers both FST and difference in allele frequencies between the two domains, were used to identify SNPs potentially under diversifying selection. SNPs from around 20 genes were detected, including genes previously identified as main target for selection, such as PaPRR3 and PaGI.
Project description:Plasmodium vivax is the most prevalent human malaria parasite outside of Africa. Yet, studies aimed to identify genes with signatures consistent with natural selection are rare. Here, we present a comparative analysis of the pattern of genetic variation of five sequenced isolates of P. vivax and its divergence with two closely related species, Plasmodium cynomolgi and Plasmodium knowlesi, using a set of orthologous genes. In contrast to Plasmodium falciparum, the parasite that causes the most lethal form of human malaria, we did not find significant constraints on the evolution of synonymous sites genome wide in P. vivax. The comparative analysis of polymorphism and divergence across loci allowed us to identify 87 genes with patterns consistent with positive selection, including genes involved in the "exportome" of P. vivax, which are potentially involved in evasion of the host immune system. Nevertheless, we have found a pattern of polymorphism genome wide that is consistent with a significant amount of constraint on the replacement changes and prevalent negative selection. Our analyses also show that silent polymorphism tends to be larger toward the ends of the chromosomes, where many genes involved in antigenicity are located, suggesting that natural selection acts not only by shaping the patterns of variation within the genes but it also affects genome organization.
Project description:Selection promoting differential use of synonymous codons has been shown for several unicellular organisms and for Drosophila, but not for mammals. Selection coefficients operating on synonymous codons are likely to be extremely small, so that a very large effective population size is required for selection to overcome the effects of drift. In mammals, codon-usage bias is believed to be determined exclusively by mutation pressure, with differences between genes due to large-scale variation in base composition around the genome. The replication-dependent histone genes are expressed at extremely high levels during periods of DNA synthesis, and thus are among the most likely mammalian genes to be affected by selection on synonymous codon usage. We suggest that the extremely biased pattern of codon usage in the H3 genes is determined in part by selection. Silent site G + C content is much higher than expected based on flanking sequence G + C content, compared to other rodent genes with similar silent site base composition but lower levels of expression. Dinucleotide-mediated mutation bias does affect codon usage, but the affect is limited to the choice between G and C in some fourfold degenerate codons. Gene conversion between the two clusters of histone genes has not been an important force in the evolution of the H3 genes, but gene conversion appears to have had some effect within the cluster on chromosome 13.