Prevalent Accumulation of Non-Optimal Codons through Somatic Mutations in Human Cancers.
ABSTRACT: Cancer is characterized by uncontrolled cell growth, and the cause of different cancers is generally attributed to checkpoint dysregulation of cell proliferation and apoptosis. Recent studies have shown that non-optimal codons were preferentially adopted by genes to generate cell cycle-dependent oscillations in protein levels. This raises the intriguing question of how dynamic changes of codon usage modulate the cancer genome to cope with a non-controlled proliferative cell cycle. In this study, we comprehensively analyzed the somatic mutations of codons in human cancers, and found that non-optimal codons tended to be accumulated through both synonymous and non-synonymous mutations compared with other types of genomic substitution. We further demonstrated that non-optimal codons were prevalently accumulated across different types of cancers, amino acids, and chromosomes, and genes with accumulation of non-optimal codons tended to be involved in protein interaction/signaling networks and encoded important enzymes in metabolic networks that played roles in cancer-related pathways. This study provides insights into the dynamics of codons in the cancer genome and demonstrates that accumulation of non-optimal codons may be an adaptive strategy for cancerous cells to win the competition with normal cells. This deeper interpretation of the patterns and the functional characterization of somatic mutations of codons will help to broaden the current understanding of the molecular basis of cancers.
Project description:Large-scale genomic analyses of human cancers have cataloged somatic point mutations thought to initiate tumor development and sustain cancer growth. However, determining the functional significance of specific alterations remains a major bottleneck in our understanding of the genetic determinants of cancer. Here, we present a platform that integrates multiplexed AAV/Cas9-mediated homology-directed repair (HDR) with DNA barcoding and high-throughput sequencing to simultaneously investigate multiple genomic alterations in de novo cancers in mice. Using this approach, we introduce a barcoded library of non-synonymous mutations into hotspot codons 12 and 13 of Kras in adult somatic cells to initiate tumors in the lung, pancreas, and muscle. High-throughput sequencing of barcoded Kras HDR alleles from bulk lung and pancreas reveals surprising diversity in Kras variant oncogenicity. Rapid, cost-effective, and quantitative approaches to simultaneously investigate the function of precise genomic alterations in vivo will help uncover novel biological and clinically actionable insights into carcinogenesis.
Project description:BACKGROUND:Synonymous mutations have been identified to play important roles in cancer development, although they do not modify the protein sequences. However, relatively little research has specifically delineated the functionality of synonymous mutations in cancer. RESULTS:We investigated the nucleotide-based and amino acid-based features of synonymous mutations across 15 cancer types from The Cancer Genome Atlas (TCGA), and revealed novel driver candidates by identifying hotspot mutations. Firstly, synonymous mutations were analyzed between TCGA and 1000 Genomes Project at nucleotide and amino acid levels. We found that C:G???T:A transitions were the most frequent single-base substitutions, and leucine underwent the largest number of synonymous mutations in TCGA due to prevalent C???T transition, which induced the transformation between optimal and non-optimal codons. Next, 97 synonymous hotspot mutations in 86 genes were nominated as candidate drivers with potential cancer risk by considering the mutational rates across different sequence contexts. We observed that non-CpG-island GC transition sequence context was positively selected across most of cancer types, and different sequence contexts under which hotspot mutations occur could be significance for genetic differences and functional features. We also found that the hotspots were more conserved than neutral mutations of hotspot-mutation-containing-genes and frequently happened at leucine. In addition, we mapped hotspots, neutral and non-hotspot mutations of hotspot-mutation-containing-genes to their respective protein domains and found ion transport domain was the most frequent one, which could mediate the cell interaction and had relevant implication for tumor therapy. And the signatures of synonymous hotspots were qualitatively similar with those of harmful missense variants. CONCLUSIONS:We illustrated the preferences of cancer associated synonymous mutations, especially hotspots, and laid the groundwork for understanding the synonymous mutations act as drivers in cancer.
Project description:Cancer is widely recognized as a genetic disease in which somatic mutations are sequentially accumulated to drive tumor progression. Although genomic landscape studies are informative for individual cancer types, a comprehensive comparative study of tumorigenic mutations across cancer types based on integrative data sources is still a pressing need. We systematically analyzed ~10(6) non-synonymous mutations extracted from COSMIC, involving ~8000 genome-wide screened samples across 23 major human cancers at both the amino acid and gene levels. Our analysis identified cancer-specific heterogeneity that traditional nucleotide variation analysis alone usually overlooked. Particularly, the amino acid arginine (R) turns out to be the most favorable target of amino acid alteration in most cancer types studied (P?<?10(-9), binomial test), reflecting its important role in cellular physiology. The tumor suppressor gene TP53 is mutated exclusively with the HYDIN, KRAS, and PTEN genes in large intestine, lung, and endometrial cancers respectively, indicating that TP53 takes part in different signaling pathways in different cancers. While some of our analyses corroborated previous observations, others indicated relevant candidates with high priority for further experimental validation. Our findings have many ramifications in understanding the etiology of cancer and the underlying molecular mechanisms in particular cancers.
Project description:Synonymous codon use is non-random. Codons most used in highly transcribed genes, often called optimal codons, typically have high gene counts of matching tRNA genes (tRNA abundance) and promote accurate and/or efficient translation. Non-optimal codons, those least used in highly expressed genes, may also affect translation. In multicellular organisms, codon optimality may vary among tissues. At present, however, tissue specificity of codon use remains poorly understood. Here, we studied codon usage of genes highly transcribed in germ line (testis and ovary) and somatic tissues (gonadectomized males and females) of the beetle Tribolium castaneum. The results demonstrate that: (i) the majority of optimal codons were organism-wide, the same in all tissues, and had numerous matching tRNA gene copies (Opt-codon?tRNAs), consistent with translational selection; (ii) some optimal codons varied among tissues, suggesting tissue-specific tRNA populations; (iii) wobble tRNA were required for translation of certain optimal codons (Opt-codonwobble), possibly allowing precise translation and/or protein folding; and (iv) remarkably, some non-optimal codons had abundant tRNA genes (Nonopt-codon?tRNAs), and genes using those codons were tightly linked to ribosomal and stress-response functions. Thus, Nonopt-codon?tRNAs codons may regulate translation of specific genes. Together, the evidence suggests that codon use and tRNA genes regulate multiple translational processes in T. castaneum.
Project description:Synonymous mutations are usually referred to as "silent", but increasing evidence shows that they are not neutral in a wide range of organisms. We looked into the relationship between synonymous codon usage bias and residue importance of voltage-gated ion channel proteins in mice, rats, and humans. We tested whether translationally optimal codons are associated with transmembrane or channel-forming regions, i.e., the sites that are particularly likely to be involved in the closing and opening of an ion channel. Our hypothesis is that translationally optimal codons are preferred at the sites within transmembrane domains or channel-forming regions in voltage-gated ion channel genes to avoid mistranslation-induced protein misfolding or loss-of-function. Using the Mantel-Haenszel procedure, which applies to categorical data, we found that translationally optimal codons are more likely to be used at transmembrane residues and the residues involved in channel-forming. We also found that the conservation level at synonymous sites in the transmembrane region is significantly higher than that in the non-transmembrane region. This study provides evidence that synonymous sites in voltage-gated ion channel genes are not neutral. Silent mutations at channel-related sites may lead to dysfunction of the ion channel.
Project description:Carcinogenesis typically involves multiple somatic mutations in caretaker (DNA repair) and gatekeeper (tumor suppressors and oncogenes) genes. Analysis of mutation spectra of the tumor suppressor that is most commonly mutated in human cancers, p53, unexpectedly suggested that somatic evolution of the p53 gene during tumorigenesis is dominated by positive selection for gain of function. This conclusion is supported by accumulating experimental evidence of evolution of new functions of p53 in tumors. These findings prompted a genome-wide analysis of possible positive selection during tumor evolution.A comprehensive analysis of probable somatic mutations in the sequences of Expressed Sequence Tags (ESTs) from malignant tumors and normal tissues was performed in order to access the prevalence of positive selection in cancer evolution. For each EST, the numbers of synonymous and non-synonymous substitutions were calculated. In order to identify genes with a signature of positive selection in cancers, these numbers were compared to: i) expected numbers and ii) the numbers for the respective genes in the ESTs from normal tissues.We identified 112 genes with a signature of positive selection in cancers, i.e., a significantly elevated ratio of non-synonymous to synonymous substitutions, in tumors as compared to 37 such genes in an approximately equal-sized EST collection from normal tissues. A substantial fraction of the tumor-specific positive-selection candidates have experimentally demonstrated or strongly predicted links to cancer.The results of EST analysis should be interpreted with extreme caution given the noise introduced by sequencing errors and undetected polymorphisms. Furthermore, an inherent limitation of EST analysis is that multiple mutations amenable to statistical analysis can be detected only in relatively highly expressed genes. Nevertheless, the present results suggest that positive selection might affect a substantial number of genes during tumorigenic somatic evolution.
Project description:Large cancer genome sequencing initiatives have led to the identification of cancer driver genes based on signals of positive selection in somatic mutation data. Additionally, the identification of purifying (negative) selection has the potential to identify essential genes that may be of therapeutic interest. The most widely used way of quantifying selection pressures in protein-coding genes is the dN/dS metric, which compares non-synonymous to synonymous substitution rates. In this study, we examine whether and how this metric is influenced by the mutational processes that have been active during tumor evolution. We use exome sequencing data from six different cancer types from The Cancer Genome Atlas (TCGA) and demonstrate that dN/dS in its basic form, where uniform base substitution probabilities are assumed, is in fact strongly biased by these mutational processes. This is particularly true in malignant melanoma, where the mutational signature is characterized by a high amount of UV-induced cytosine to thymine mutations at dipyrimidine dinucleotides. This increases the likelihood of random synonymous mutations occurring in hydrophobic amino acid codons, leading to reduced dN/dS ratios in genes encoding membrane proteins and falsely suggesting purifying selection in these genes. When this effect is corrected for by taking mutational signature-derived substitution probabilities into account, purifying selection was found to be limited and similar in all cancer types studied. Our results demonstrate that it is crucial to take mutational signatures into account when applying the dN/dS metric to cancer somatic mutation data.
Project description:Somatic synonymous mutations are one of the most frequent genetic variants occurring in the coding region of cancer genomes, while their contributions to cancer development remain largely unknown. To assess whether synonymous mutations involved in post-transcriptional regulation contribute to the genetic etiology of cancers, we collected whole exome data from 8,320 patients across 22 cancer types. By employing our developed algorithm, PIVar, we identified a total of 22,948 posttranscriptionally impaired synonymous SNVs (pisSNVs) spanning 2,042 genes. In addition, 35 RNA binding proteins impacted by these identified pisSNVs were significantly enriched. Remarkably, we discovered markedly elevated ratio of somatic pisSNVs across all 22 cancer types, and a high pisSNV ratio was associated with worse patient survival in five cancer types. Intriguing, several well-established cancer genes, including PTEN, RB1 and PIK3CA, appeared to contribute to tumorigenesis at both protein function and posttranscriptional regulation levels, whereas some pisSNV-hosted genes, including UBR4, EP400 and INTS1, exerted their function during carcinogenesis mainly via posttranscriptional mechanisms. Moreover, we predicted three drugs associated with two pisSNVs, and numerous compounds associated with expression signature of pisSNV-hosted genes. Our study reveals the prevalence and clinical relevance of pisSNVs in cancers, and emphasizes the importance of considering posttranscriptional impaired synonymous mutations in cancer biology.
Project description:We performed high-throughput cDNA sequencing in colorectal adenocarcinoma and matching normal colorectal epithelium. All six hundred three genes in the UCSC database that were expressed in colon cancers and contained open reading frames of 1000 nucleotides or less were selected for study (total basepairs/bp, 366,686). 304,350 of these 366,686 bp (83.0%) were amplified and sequenced successfully. Seventy-eight sequence variants present in germline (i.e. normal) as well as matching somatic (i.e. tumor) DNA were discovered, yielding a frequency of 1 variant per 3,902 bp. Fifty-one of these sequence variants were homozygous (26 synonymous, 25 non-synonymous), while 27 were heterozygous (11 synonymous, 16 non-synonymous). Cancer tissue contained only one sequence-altered allele of the gene ATP50, which was present heterozygously alongside the wild-type allele in matching normal epithelium. Despite this relatively large number of bp and genes sequenced, no somatic mutations unique to tumor were found. High-throughput cDNA sequencing is a practical approach for detecting novel sequence variations and alterations in human tumors, such as those of the colon.
Project description:Somatic mutations of mitochondrial DNA (mtDNA) are common in many human cancers. We have described an oligonucleotide microarray ("MitoChip") for rapid sequencing of the entire mitochondrial genome (Zhou et al, J Mol Diagn 2006), facilitating the analysis of mtDNA mutations in preneoplastic lesions. We examined 14 precancerous lesions, including seven Barrett esophagus biopsies, with or without associated dysplasia; four colorectal adenomas; and three inflammatory colitis-associated dysplasia specimens. In all cases, matched normal tissues from the corresponding site were obtained as germline control. MitoChip analysis was performed on DNA obtained from cryostat-embedded specimens.A total of 513,639 bases of mtDNA were sequenced in the 14 samples, with 490,224 bases (95.4%) bases assigned by the automated genotyping software. All preneoplastic lesions examined demonstrated at least one somatic mtDNA sequence alteration. Of the 100 somatic mtDNA alterations observed in the 14 cases, 27 were non-synonymous coding region mutations (i.e., resulting in an amino acid change), 36 were synonymous, and 37 involved non-coding mtDNA. Overall, somatic alterations most commonly involved the COI, ND4 and ND5 genes. Notably, somatic mtDNA alterations were observed in preneoplastic lesions of the gastrointestinal tract even in the absence of histopathologic evidence of dysplasia, suggesting that the mitochondrial genome is susceptible at the earliest stages of multistep cancer progression.Our findings further substantiate the rationale for exploring the mitochondrial genome as a biomarker for the early diagnosis of cancer, and confirm the utility of a high-throughput array-based platform for this purpose from a clinical applicability standpoint.