Signatures of domain shuffling in the human genome.
ABSTRACT: To elucidate the role of exon shuffling in shaping the complexity of the human genome/proteome, we have systematically analyzed intron phase distributions in the coding sequence of human protein domains. We found that introns at the boundaries of domains show high excess of symmetrical phase combinations (i.e., 0-0, 1-1, and 2-2), whereas nonboundary introns show no excess symmetry. This suggests that exon shuffling has primarily involved rearrangement of structural and functional domains as a whole. Furthermore, we found that domains flanked by phase 1 introns have dramatically expanded in the human genome due to domain shuffling and that 1-1 symmetrical domains and domain families are nonrandomly distributed with respect to their age. The predominance and extracellular location of 1-1 symmetrical domains among domains specific to metazoans suggests that they are associated with the rise of multicellularity. On the other hand, 0-0 symmetrical domains tend to be over-represented among ancient protein domains that are shared between the eukaryotic and prokaryotic kingdoms, which is compatible with the suggestion of primordial domain shuffling in the progenote. To see whether the human data reflect general genomic patterns of metazoans, similar analyses were done for the nematode Caenorhabditis elegans. Although the C. elegans data generally concur with the human patterns, we identified fewer intron-bounded domains in this organism, consistent with the lower complexity of C. elegans genes. [The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: Z. Gu and R. Stevens.]
Project description:Among three sources of evolutionary innovation in gene function - point mutations, gene duplications, and gene shuffling (recombination between dissimilar genes) - gene shuffling is the most potent one. However, surprisingly little is known about its incidence on a genome-wide scale.We have studied shuffling in genes that are conserved between distantly related species. Specifically, we estimated the incidence of gene shuffling in ten organisms from the three domains of life: eukaryotes, eubacteria, and archaea, considering only genes showing significant sequence similarity in pairwise genome comparisons. We found that successful gene shuffling is very rare among such conserved genes. For example, we could detect only 48 successful gene-shuffling events in the genome of the fruit fly Drosophila melanogaster which have occurred since its common ancestor with the worm Caenorhabditis elegans more than half a billion years ago.The incidence of gene shuffling is roughly an order of magnitude smaller than the incidence of single-gene duplication in eukaryotes, but it can approach or even exceed the gene-duplication rate in prokaryotes. If true in general, this pattern suggests that gene shuffling may not be a major force in reshaping the core genomes of eukaryotes. Our results also cast doubt on the notion that introns facilitate gene shuffling, both because prokaryotes show an appreciable incidence of gene shuffling despite their lack of introns and because we find no statistical association between exon-intron boundaries and recombined domains in the two multicellular genomes we studied.
Project description:We present evidence that a well defined subset of intron positions shows a non-random distribution in ancient genes. We analyze a database of ancient conserved regions drawn from GenBank 101 to retest two predictions of the theory that the first genes were constructed by exon shuffling. These predictions are that there should be an excess of symmetric exons (and sets of exons) flanked by introns of the same phase (positions within the codon) and that intron positions in ancient proteins should correlate with the boundaries of compact protein modules. Both these predictions are supported by the data, with considerable statistical force (P values < 0.0001). Intron positions correlate to modules of diameters around 21, 27, and 33 A, and this correlation is due to phase zero introns. We suggest that 30-40% of present day intron positions in ancient genes correspond to phase zero introns originally present in the progenote, while almost all of the remaining intron positions correspond to introns added, or moved, appearing equally in all three intron phases. This proposal provides a resolution for many of the arguments of the introns-early/introns-late debate.
Project description:BACKGROUND: Physical protein-protein interaction (PPI) is a critical phenomenon for the function of most proteins in living organisms and a significant fraction of PPIs are the result of domain-domain interactions. Exon shuffling, intron-mediated recombination of exons from existing genes, is known to have been a major mechanism of domain shuffling in metazoans. Thus, we hypothesized that exon shuffling could have a significant influence in shaping the topology of PPI networks. RESULTS: We tested our hypothesis by compiling exon shuffling and PPI data from six eukaryotic species: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Cryptococcus neoformans and Arabidopsis thaliana. For all four metazoan species, genes enriched in exon shuffling events presented on average higher vertex degree (number of interacting partners) in PPI networks. Furthermore, we verified that a set of protein domains that are simultaneously promiscuous (known to interact to multiple types of other domains), self-interacting (able to interact with another copy of themselves) and abundant in the genomes presents a stronger signal for exon shuffling. CONCLUSIONS: Exon shuffling appears to have been a recurrent mechanism for the emergence of new PPIs along metazoan evolution. In metazoan genomes, exon shuffling also promoted the expansion of some protein domains. We speculate that their promiscuous and self-interacting properties may have been decisive for that expansion.
Project description:<h4>Background</h4>The origin of spliceosomal introns is the central subject of the introns-early versus introns-late debate. The distribution of intron phases is non-uniform, with an excess of phase-0 introns. Introns-early explains this by speculating that a fraction of present-day introns were present between minigenes in the progenote and therefore must lie in phase-0. In contrast, introns-late predicts that the nonuniformity of intron phase distribution reflects the nonrandomness of intron insertions.<h4>Results</h4>In this paper, we tested the two theories using analyses of intron phase distribution. We inferred the evolution of intron phase distribution from a dataset of 684 gene orthologs from seven eukaryotes using a maximum likelihood method. We also tested whether the observed intron phase distributions from 10 eukaryotes can be explained by intron insertions on a genome-wide scale. In contrast to the prediction of introns-early, the inferred evolution of intron phase distribution showed that the proportion of phase-0 introns increased over evolution. Consistent with introns-late, the observed intron phase distributions matched those predicted by an intron insertion model quite well.<h4>Conclusion</h4>Our results strongly support the introns-late hypothesis of the origin of spliceosomal introns.
Project description:Choanoflagellates are the closest known relatives of metazoans. To discover potential molecular mechanisms underlying the evolution of metazoan multicellularity, we sequenced and analysed the genome of the unicellular choanoflagellate Monosiga brevicollis. The genome contains approximately 9,200 intron-rich genes, including a number that encode cell adhesion and signalling protein domains that are otherwise restricted to metazoans. Here we show that the physical linkages among protein domains often differ between M. brevicollis and metazoans, suggesting that abundant domain shuffling followed the separation of the choanoflagellate and metazoan lineages. The completion of the M. brevicollis genome allows us to reconstruct with increasing resolution the genomic changes that accompanied the origin of metazoans.
Project description:In the evolution of eukaryotic genes, introns are believed to have played a major role in increasing the probability of favorable duplication events, chance recombinations, and exon shuffling resulting in functional hybrid proteins. As a rule, prokaryotic genes lack introns, and the examples of prokaryotic introns described do not seem to have contributed to gene evolution by exon shuffling. Still, certain protein families in modern bacteria evolve rapidly by recombination of genes, duplication of functional domains, and as shown for protein PAB of the anaerobic bacterial species Peptostreptococcus magnus, by the shuffling of an albumin-binding protein module from group C and G streptococci. Characterization of a protein PAB-related gene in a P. magnus strain with less albumin-binding activity revealed that the shuffled module was missing. Based on this fact and observations made when comparing gene sequences of this family of bacterial surface proteins interacting with albumin and/or immunoglobulin, a model is presented that can explain how this rapid intronless evolution takes place. A new kind of genetic element is introduced: the recer sequence promoting interdomain, in frame recombination and acting as a structure-less flexibility-promoting spacer in the corresponding protein. The data presented also suggest that antibiotics could represent the selective pressure behind the shuffling of protein modules in P. magnus, a member of the indigenous bacterial flora.
Project description:BACKGROUND: Mitochondrial porins, or voltage-dependent anion-selective channels (VDAC) allow the passage of small molecules across the mitochondrial outer membrane, and are involved in complex interactions regulating organellar and cellular metabolism. Numerous organisms possess multiple porin isoforms, and initial studies indicated an intriguing evolutionary history for these proteins and the genes that encode them. RESULTS: In this work, the wealth of recent sequence information was used to perform a comprehensive analysis of the evolutionary history of mitochondrial porins. Fungal porin sequences were well represented, and newly-released sequences from stramenopiles, alveolates, and seed and flowering plants were analyzed. A combination of Neighbour-Joining and Bayesian methods was used to determine phylogenetic relationships among the proteins. The aligned sequences were also used to reassess the validity of previously described eukaryotic porin motifs and to search for signature sequences characteristic of VDACs from plants, animals and fungi. Secondary structure predictions were performed on the aligned VDAC primary sequences and were used to evaluate the sites of intron insertion in a representative set of the corresponding VDAC genes. CONCLUSION: Our phylogenetic analysis clearly shows that paralogs have appeared several times during the evolution of VDACs from the plants, metazoans, and even the fungi, suggesting that there are no "ancient" paralogs within the gene family. Sequence motifs characteristic of the members of the crown groups of organisms were identified. Secondary structure predictions suggest a common 16 beta-strand framework for the transmembrane arrangement of all porin isoforms. The GLK (and homologous or analogous motifs) and the eukaryotic porin motifs in the four representative Chordates tend to be in exons that appear to have changed little during the evolution of these metazoans. In fact there is phase correlation among the introns in these genes. Finally, our preliminary data support the notion that introns usually do not interrupt structural protein motifs, namely the predicted beta-strands. These observations concur with the concept of exon shuffling, wherein exons encode structural modules of proteins and the loss and gain of introns and the shuffling of exons via recombination events contribute to the complexity of modern day proteomes.
Project description:Ribosomal protein genes (RPGs) are a powerful tool for studying intron evolution. They exist in all three domains of life and are much conserved. Accumulating genomic data suggest that RPG introns in many organisms abound with non-protein-coding-RNAs (ncRNAs). These ancient ncRNAs are small nucleolar RNAs (snoRNAs) essential for ribosome assembly. They are also mobile genetic elements and therefore probably important in diversification and enrichment of transcriptomes through various mechanisms such as intron/exon gain/loss. snoRNAs in basal metazoans are poorly characterized. We examined 449 RPG introns, in total, from four demosponges: Amphimedon queenslandica, Suberites domuncula, Suberites ficus and Suberites pagurorum and showed that RPG introns from A. queenslandica share position conservancy and some structural similarity with "higher" metazoans. Moreover, our study indicates that mobile element insertions play an important role in the evolution of their size. In four sponges 51 snoRNAs were identified. The analysis showed discrepancies between the snoRNA pools of orthologous RPG introns between S. domuncula and A. queenslandica. Furthermore, these two sponges show as much conservancy of RPG intron positions between each other as between themselves and human. Sponges from the Suberites genus show consistency in RPG intron position conservation. However, significant differences in some of the orthologous RPG introns of closely related sponges were observed. This indicates that RPG introns are dynamic even on these shorter evolutionary time scales.
Project description:The gene encoding the glycolytic enzyme triose-phosphate isomerase (TPI; EC 220.127.116.11) has been central to the long-standing controversy on the origin and evolutionary significance of spliceosomal introns by virtue of its pivotal support for the introns-early view, or exon theory of genes. Putative correlations between intron positions and TPI protein structure have led to the conjecture that the gene was assembled by exon shuffling, and five TPI intron positions are old by the criterion of being conserved between animals and plants. We have sequenced TPI genes from three diverse eukaryotes--the basidiomycete Coprinus cinereus, the nematode Caenorhabditis elegans, and the insect Heliothis virescens--and have found introns at seven novel positions that disrupt previously recognized gene/protein structure correlations. The set of 21 TPI introns now known is consistent with a random model of intron insertion. Twelve of the 21 TPI introns appear to be of recent origin since each is present in but a single examined species. These results, together with their implication that as more TPI genes are sequenced more intron positions will be found, render TPI untenable as a paradigm for the introns-early theory and, instead, support the introns-late view that spliceosomal introns have been inserted into preexisting genes during eukaryotic evolution.
Project description:BACKGROUND: The timing of the origin of introns is of crucial importance for an understanding of early genome architecture. The Exon theory of genes proposed a role for introns in the formation of multi-exon proteins by exon shuffling and predicts the presence of conserved splice sites in ancient genes. In this study, large-scale analysis of potential conserved splice sites was performed using an intron-exon database (ExInt) derived from GenBank. RESULTS: A set of conserved intron positions was found by matching identical splice sites sequences from distantly-related eukaryotic kingdoms. Most amino acid sequences with conserved introns were homologous to consensus sequences of functional domains from conserved proteins including kinases, phosphatases, small GTPases, transporters and matrix proteins. These included ancient proteins that originated before the eukaryote-prokaryote split, for instance the catalytic domain of protein phosphatase 2A where a total of eleven conserved introns were found. Using an experimental setup in which the relation between a splice site and the ancientness of its surrounding sequence could be studied, it was found that the presence of an intron was positively correlated to the ancientness of its surrounding sequence. Intron phase conservation was linked to the conservation of the gene sequence and not to the splice site sequence itself. However, no apparent differences in phase distribution were found between introns in conserved versus non-conserved sequences. CONCLUSION: The data confirm an origin of introns deep in the eukaryotic branch and is in concordance with the presence of introns in the first functional protein modules in an 'Exon theory of genes' scenario. A model is proposed in which shuffling of primordial short exonic sequences led to the formation of the first functional protein modules, in line with hypotheses that see the formation of introns integral to the origins of genome evolution.