Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants.
ABSTRACT: BACKGROUND:The sharp increase of plant genome and transcriptome data provide valuable resources to investigate evolutionary consequences of gene duplication in a range of taxa, and unravel common principles underlying duplicate gene retention. RESULTS:We survey 141 sequenced plant genomes to elucidate consequences of gene and genome duplication, processes central to the evolution of biodiversity. We develop a pipeline named DupGen_finder to identify different modes of gene duplication in plants. Genes derived from whole-genome, tandem, proximal, transposed, or dispersed duplication differ in abundance, selection pressure, expression divergence, and gene conversion rate among genomes. The number of WGD-derived duplicate genes decreases exponentially with increasing age of duplication events-transposed duplication- and dispersed duplication-derived genes declined in parallel. In contrast, the frequency of tandem and proximal duplications showed no significant decrease over time, providing a continuous supply of variants available for adaptation to continuously changing environments. Moreover, tandem and proximal duplicates experienced stronger selective pressure than genes formed by other modes and evolved toward biased functional roles involved in plant self-defense. The rate of gene conversion among WGD-derived gene pairs declined over time, peaking shortly after polyploidization. To provide a platform for accessing duplicated gene pairs in different plants, we constructed the Plant Duplicate Gene Database. CONCLUSIONS:We identify a comprehensive landscape of different modes of gene duplication across the plant kingdom by comparing 141 genomes, which provides a solid foundation for further investigation of the dynamic evolution of duplicate genes.
Project description:BACKGROUND: Both single gene and whole genome duplications (WGD) have recurred in angiosperm evolution. However, the evolutionary effects of different modes of gene duplication, especially regarding their contributions to genetic novelty or redundancy, have been inadequately explored. RESULTS: In Arabidopsis thaliana and Oryza sativa (rice), species that deeply sample botanical diversity and for which expression data are available from a wide range of tissues and physiological conditions, we have compared expression divergence between genes duplicated by six different mechanisms (WGD, tandem, proximal, DNA based transposed, retrotransposed and dispersed), and between positional orthologs. Both neo-functionalization and genetic redundancy appear to contribute to retention of duplicate genes. Genes resulting from WGD and tandem duplications diverge slowest in both coding sequences and gene expression, and contribute most to genetic redundancy, while other duplication modes contribute more to evolutionary novelty. WGD duplicates may more frequently be retained due to dosage amplification, while inferred transposon mediated gene duplications tend to reduce gene expression levels. The extent of expression divergence between duplicates is discernibly related to duplication modes, different WGD events, amino acid divergence, and putatively neutral divergence (time), but the contribution of each factor is heterogeneous among duplication modes. Gene loss may retard inter-species expression divergence. Members of different gene families may have non-random patterns of origin that are similar in Arabidopsis and rice, suggesting the action of pan-taxon principles of molecular evolution. CONCLUSION: Gene duplication modes differ in contribution to genetic novelty and redundancy, but show some parallels in taxa separated by hundreds of millions of years of evolution.
Project description:Pear is an important fruit crop of the Rosaceae family and has experienced two rounds of ancient whole-genome duplications (WGDs). However, whether different types of gene duplications evolved differently after duplication remains unclear in the pear genome. In this study, we identified the different modes of gene duplication in pear. Duplicate genes derived from WGD, tandem, proximal, retrotransposed, DNA-based transposed or dispersed duplications differ in genomic distribution, gene features, selection pressure, expression divergence, regulatory divergence and biological roles. Widespread sequence, expression and regulatory divergence have occurred between duplicate genes over the 30-45 million years of evolution after the recent genome duplication in pear. The retrotransposed genes show relatively higher expression and regulatory divergence than other gene duplication modes. In contrast, WGD genes underwent a slower sequence divergence and may be influenced by abundant gene conversion events. Moreover, the different classes of duplicate genes exhibited biased functional roles. We also investigated the evolution and expansion patterns of the gene families involved in sugar and organic acid metabolism pathways, which are closely related to the fruit quality and taste in pear. Single-gene duplications largely account for the extensive expansion of gene families involved in the sorbitol metabolism pathway in pear. Gene family expansion was also detected in the sucrose metabolism pathway and tricarboxylic acid cycle pathways. Thus, this study provides insights into the evolutionary fates of duplicated genes.
Project description:BACKGROUND: Divergence in gene structure following gene duplication is not well understood. Gene duplication can occur via whole-genome duplication (WGD) and single-gene duplications including tandem, proximal and transposed duplications. Different modes of gene duplication may be associated with different types, levels, and patterns of structural divergence. RESULTS: In Arabidopsis thaliana, we denote levels of structural divergence between duplicated genes by differences in coding-region lengths and average exon lengths, and the number of insertions/deletions (indels) and maximum indel length in their protein sequence alignment. Among recent duplicates of different modes, transposed duplicates diverge most dramatically in gene structure. In transposed duplications, parental loci tend to have longer coding-regions and exons, and smaller numbers of indels and maximum indel lengths than transposed loci, reflecting biased structural changes in transposed duplications. Structural divergence increases with evolutionary time for WGDs, but not transposed duplications, possibly because of biased gene losses following transposed duplications. Structural divergence has heterogeneous relationships with nucleotide substitution rates, but is consistently positively correlated with gene expression divergence. The NBS-LRR gene family shows higher-than-average levels of structural divergence. CONCLUSIONS: Our study suggests that structural divergence between duplicated genes is greatly affected by the mechanisms of gene duplication and may be not proportional to evolutionary time, and that certain gene families are under selection on rapid evolution of gene structure.
Project description:Different modes of gene duplication including whole-genome duplication (WGD), and tandem, proximal and dispersed duplications are widespread in angiosperm genomes. Small-scale, stochastic gene relocations and transposed gene duplications are widely accepted to be the primary mechanisms for the creation of dispersed duplicates. However, here we show that most surviving ancient dispersed duplicates in core eudicots originated from large-scale gene relocations within a narrow window of time following a genome triplication (?) event that occurred in the stem lineage of core eudicots. We name these surviving ancient dispersed duplicates as relocated ? duplicates. In Arabidopsis thaliana, relocated ?, WGD and single-gene duplicates have distinct features with regard to gene functions, essentiality, and protein interactions. Relative to ? duplicates, relocated ? duplicates have higher non-synonymous substitution rates, but comparable levels of expression and regulation divergence. Thus, relocated ? duplicates should be distinguished from WGD and single-gene duplicates for evolutionary investigations. Our results suggest large-scale gene relocations following the ? event were associated with the diversification of core eudicots.
Project description:<h4>Background</h4>Genetic regulation is known to contribute to the divergent expression of duplicate genes; however, little is known about how epigenetic modifications regulate the expression of duplicate genes in plants.<h4>Methods</h4>The histone modification (HM) profile patterns of different modes of gene duplication, including the whole genome duplication, proximal duplication, tandem duplication and transposed duplication were characterized based on ChIP-chip or ChIP-seq datasets. In this study, 10 distinct HM marks including H2Bub, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K9me2, H3K27me1, H3K27me3, H3K36me3 and H3K14ac were analyzed. Moreover, the features of gene duplication with different HM patterns were characterized based on 88 RNA-seq datasets of <i>Arabidopsis thaliana</i>.<h4>Results</h4>This study showed that duplicate genes in <i>Arabidopsis</i> have a more similar HM pattern than single-copy genes in both their promoters and protein-coding regions. The evolution of HM marks is found to be coupled with coding sequence divergence and expression divergence after gene duplication. We found that functionally selective constraints may impose on epigenetic evolution after gene duplication. Furthermore, duplicate genes with distinct functions have more divergence in histone modification compared with the ones with the same function, while higher expression divergence is found with mutations of chromatin modifiers. This study shows the role of epigenetic marks in regulating gene expression and functional divergence after gene duplication in plants based on sequencing data.
Project description:We identified and characterized the pseudogene complements of five plant species: four dicots (Arabidopsis thaliana, Vitis vinifera, Populus trichocarpa and Phaseolus vulgaris) and one monocot (Oryza sativa). Retroposition was considered of modest importance for pseudogene formation in all investigated species except V. vinifera, which showed an unusually high number of retro-pseudogenes in non coding genic regions. By using a pipeline for the classification of sequence duplicates in plant genomes, we compared the relative importance of whole genome, tandem, proximal, transposed and dispersed duplication modes in the pseudo and functional gene complements. Pseudogenes showed higher tendencies than functional genes to genomic dispersion. Dispersed pseudogenes were prevalently fragmented and showed high sequence divergence at flanking regions. On the contrary, those deriving from whole genome duplication were proportionally less than expected based on observations on functional loci and showed higher levels of flanking sequence conservation than dispersed pseudogenes. Pseudogenes deriving from tandem and proximal duplications were in excess compared to functional loci, probably reflecting the high evolutionary rate associated with these duplication modes in plant genomes. These data are compatible with high rates of sequence turnover at neutral sites and double strand break repairs mediated duplication mechanisms.
Project description:All extant seed plants are successful paleopolyploids, whose genomes carry duplicate genes that have survived repeated episodes of diploidization. However, the survival of gene duplicates is biased with respect to gene function and mechanism of duplication. Transcription factors, in particular, are reported to be preferentially retained following whole-genome duplications (WGDs), but disproportionately lost when duplicated by tandem events. An explanation for this pattern is provided by the Gene Balance Hypothesis (GBH), which posits that duplicates of highly connected genes are retained following WGDs to maintain optimal stoichiometry among gene products; but such connected gene duplicates are disfavored following tandem duplications.We used genomic data from 25 taxonomically diverse plant species to investigate the roles of duplication mechanism, gene function, and age of duplication in the retention of duplicate genes. Enrichment analyses were conducted to identify Gene Ontology (GO) functional categories that were overrepresented in either WGD or tandem duplications, or across ranges of divergence times. Tandem paralogs were much younger, on average, than WGD paralogs and the most frequently overrepresented GO categories were not shared between tandem and WGD paralogs. Transcription factors were overrepresented among ancient paralogs regardless of mechanism of origin or presence of a WGD. Also, in many cases, there was no bias toward transcription factor retention following recent WGDs.Both the fixation and the retention of duplicated genes in plant genomes are context-dependent events. The strong bias toward ancient transcription factor duplicates can be reconciled with the GBH if selection for optimal stoichiometry among gene products is strongest following the earliest polyploidization events and becomes increasingly relaxed as gene families expand.
Project description:BACKGROUND:F-box proteins are substrate-recognition components of the Skp1-Rbx1-Cul1-F-box protein (SCF) ubiquitin ligases. By selectively targeting the key regulatory proteins or enzymes for ubiquitination and 26S proteasome mediated degradation, F-box proteins play diverse roles in plant growth/development and in the responses of plants to both environmental and endogenous signals. Studies of F-box proteins from the model plant Arabidopsis and from many additional plant species have demonstrated that they belong to a super gene family, and function across almost all aspects of the plant life cycle. However, systematic exploration of F-box family genes in the important fiber crop cotton (Gossypium hirsutum) has not been previously performed. The genome-wide analysis of the cotton F-box gene family is now possible thanks to the completion of several cotton genome sequencing projects. RESULTS:In current study, we first conducted a genome-wide investigation of cotton F-box family genes by reference to the published F-box protein sequences from other plant species. 592?F-box protein encoding genes were identified in the Gossypium hirsutume acc.TM-1 genome and, subsequently, we were able to present their gene structures, chromosomal locations, syntenic relationships with their parent species. In addition, duplication modes analysis showed that cotton F-box genes were distributed to 26 chromosomes, with the maximum number of genes being detected on chromosome 5. Although the WGD (whole-genome duplication) mode seems play a dominant role during cotton F-box gene expansion process, other duplication modes including TD (tandem duplication), PD (proximal duplication), and TRD (transposed duplication) also contribute significantly to the evolutionary expansion of cotton F-box genes. Collectively, these bioinformatic analysis suggest possible evolutionary forces underlying F-box gene diversification. Additionally, we also conducted analyses of gene ontology, and expression profiles in silico, allowing identification of F-box gene members potentially involved in hormone signal transduction. CONCLUSION:The results of this study provide first insights into the Gossypium hirsutum F-box gene family, which lays the foundation for future studies of functionality, particularly those involving F-box protein family members that play a role in hormone signal transduction.
Project description:Gene duplication (GD), thought to facilitate evolutionary innovation and adaptation, has been studied in many phylogenetic lineages. However, it remains poorly investigated in trematodes, a medically important parasite group that has been evolutionarily specialized during long-term host-parasite interaction. In this study, we conducted a genome-wide study of GD modes and contributions in Schistosoma mansoni, a pathogen causing human schistosomiasis. We combined several lines of evidence provided by duplicate age distributions, genomic sequence similarity, depth-of-coverage and gene synteny to identify the dominant drivers that contribute to the origins of new genes in this parasite. The gene divergences following duplication events (gene structure, expression and function retention) were also analyzed. Our results reveal that the genome lacks whole genome duplication (WGD) in a long evolutionary time and has few large segmental duplications, but is extensively shaped by the continuous small-scale gene duplications (SSGDs) (i.e., dispersed, tandem and proximal GDs) that may be derived from (retro-) transposition and unequal crossing over. Additionally, our study shows that the genes generated by tandem duplications have the smallest divergence during the evolution. Finally, we demonstrate that SSGDs, especially the tandem duplications, greatly contribute to the expansions of some preferentially retained pathogenesis-associated gene families that are associated with the parasite's survival during infection. This study is the first to systematically summarize the landscape of GDs in trematodes and provides new insights of adaptations to parasitism linked to GD events for these parasites.
Project description:The MADS family is an ancient and best-studied transcription factor and plays fundamental roles in almost every developmental process in plants. In the plant evolutionary history, the whole genome duplication (WGD) events are important not only to the plant species evolution, but to expansion of members of the gene families. Soybean as a model legume crop has experience three rounds of WGD events. Members of some MIKC(C) subfamilies, such as SOC, AGL6, SQUA, SVP, AGL17 and DEF/GLO, were expanded after soybean three rounds of WGD events. And some MIKC(C) subfamilies, MIKC* and type I MADS families had experienced faster birth-and-death evolution and their traces before the Glycine WGD event were not found. Transposed duplication played important roles in tandem arrangements among the members of different subfamilies. According to the expression profiles of type I and MIKC paralog pair genes, the fates of MIKC paralog gene pairs were subfunctionalization, and the fates of type I MADS paralog gene pairs were nonfunctionalization. 137 out of 163 MADS genes were close to 186 loci within 2 Mb genomic regions associated with seed-relative QTLs, among which 115 genes expressed during the seed development. Although MIKC(C) genes kept the important and conserved functions of the flower development, most MIKC(C) genes showed potentially essential roles in the seed development as well as the type I MADS.