Identification of a genome-specific repetitive element in the Gossypium D genome.
ABSTRACT: The activity of genome-specific repetitive sequences is the main cause of genome variation between Gossypium A and D genomes. Through comparative analysis of the two genomes, we retrieved a repetitive element termed ICRd motif, which appears frequently in the diploid Gossypium raimondii (D5) genome but rarely in the diploid Gossypium arboreum (A2) genome. We further explored the existence of the ICRd motif in chromosomes of G. raimondii, G. arboreum, and two tetraploid (AADD) cotton species, Gossypium hirsutum and Gossypium barbadense, by fluorescence in situ hybridization (FISH), and observed that the ICRd motif exists in the D5 and D-subgenomes but not in the A2 and A-subgenomes. The ICRd motif comprises two components, a variable tandem repeat (TR) region and a conservative sequence (CS). The two constituents each have hundreds of repeats that evenly distribute across 13 chromosomes of the D5genome. The ICRd motif (and its repeats) was revealed as the common conservative region harbored by ancient Long Terminal Repeat Retrotransposons. Identification and investigation of the ICRd motif promotes the study of A and D genome differences, facilitates research on Gossypium genome evolution, and provides assistance to subgenome identification and genome assembling.
Project description:Cotton (Gossypium spp.) is commonly grouped into eight diploid genomic groups and an allotetraploid genomic group, AD. The mitochondrial genomes supply new information to understand both the evolution process and the mechanism of cytoplasmic male sterility. Based on previously released mitochondrial genomes of G. hirsutum (AD1), G. barbadense (AD2), G. raimondii (D5) and G. arboreum (A2), together with data of six other mitochondrial genomes, to elucidate the evolution and diversity of mitochondrial genomes within Gossypium.Six Gossypium mitochondrial genomes, including three diploid species from D and three allotetraploid species from AD genome groups (G. thurberi D1, G. davidsonii D3-d and G. trilobum D8; G. tomentosum AD3, G. mustelinum AD4 and G. darwinii AD5), were assembled as the single circular molecules of lengths about 644 kb in diploid species and 677 kb in allotetraploid species, respectively. The genomic structures of mitochondrial in D group species were identical but differed from the mitogenome of G. arboreum (A2), as well as from the mitogenomes of five species of the AD group. There mainly existed four or six large repeats in the mitogenomes of the A?+?AD or D group species, respectively. These variations in repeat sequences caused the major inversions and translocations within the mitochondrial genome. The mitochondrial genome complexity in Gossypium presented eight unique segments in D group species, three specific fragments in A?+?AD group species and a large segment (more than 11 kb) in diploid species. These insertions or deletions were most probably generated from crossovers between repetitive or homologous regions. Unlike the highly variable genome structure, evolutionary distance of mitochondrial genes was 1/6th the frequency of that in chloroplast genes of Gossypium. RNA editing events were conserved in cotton mitochondrial genes. We confirmed two near full length of the integration of the mitochondrial genome into chromosome 1 of G. raimondii and chromosome A03 of G. hirsutum, respectively, with insertion time less than 1.03 MYA.Ten Gossypium mitochondrial sequences highlight the insights to the evolution of cotton mitogenomes.
Project description:Understanding the composition, evolution, and function of the Gossypium hirsutum (cotton) genome is complicated by the joint presence of two genomes in its nucleus (AT and DT genomes). These two genomes were derived from progenitor A-genome and D-genome diploids involved in ancestral allopolyploidization. To better understand the allopolyploid genome, we re-sequenced the genomes of extant diploid relatives that contain the A1 (Gossypium herbaceum), A2 (Gossypium arboreum), or D5 (Gossypium raimondii) genomes. We conducted a comparative analysis using deep re-sequencing of multiple accessions of each diploid species and identified 24 million SNPs between the A-diploid and D-diploid genomes. These analyses facilitated the construction of a robust index of conserved SNPs between the A-genomes and D-genomes at all detected polymorphic loci. This index is widely applicable for read mapping efforts of other diploid and allopolyploid Gossypium accessions. Further analysis also revealed locations of putative duplications and deletions in the A-genome relative to the D-genome reference sequence. The approximately 25,400 deleted regions included more than 50% deletion of 978 genes, including many involved with starch synthesis. In the polyploid genome, we also detected 1,472 conversion events between homoeologous chromosomes, including events that overlapped 113 genes. Continued characterization of the Gossypium genomes will further enhance our ability to manipulate fiber and agronomic production of cotton.
Project description:Heat shock transcription factors (HSFs) are involved in environmental stress response and plant development, such as heat stress and flowering development. According to the structural characteristics of the HSF gene family, HSF genes were classified into three major types (HSFA, HSFB, and HSFC) in plants. Using conserved domains of HSF genes, we identified 621 HSF genes among 13 cotton genomes, consisting of eight diploid and five tetraploid genomes. Phylogenetic analysis indicated that HSF genes among 13 cotton genomes were grouped into two different clusters: one cluster contained all HSF genes of HSFA and HSFC, and the other cluster contained all HSF genes of HSFB. Comparative analysis of HSF genes in <i>Arabidopsis thaliana</i>, <i>Gossypium herbaceum</i> (A1), <i>Gossypium arboreum</i> (A2), <i>Gossypium raimondii</i> (D5), and <i>Gossypium hirsutum</i> (AD1) genomes demonstrated that four HSF genes were inherited from a common ancestor, A0, of all existing cotton A genomes. Members of the HSF gene family in <i>G. herbaceum</i> (A1) genome indicated a significant loss compared with those in <i>G. arboretum</i> (A2) and <i>G. hirsutum</i> (AD1) A genomes. However, HSF genes in <i>G. raimondii</i> (D5) showed relative loss compared with those in <i>G. hirsutum</i> (AD1) D genome. Analysis of tandem duplication (TD) events of HSF genes revealed that protein-coding genes among different cotton genomes have experienced TD events, but only the two-gene tandem array was detected in <i>Gossypium thurberi</i> (D1) genome. The expression analysis of HSF genes in <i>G. hirsutum</i> (AD1) and <i>Gossypium barbadense</i> (AD2) genomes indicated that the expressed HSF genes were divided into two different groups, respectively, and the expressed HSF orthologous genes between the two genomes showed totally different expression patterns despite the implementation of the same abiotic stresses. This work will provide novel insights for the study of evolutionary history and expression characterization of HSF genes in different cotton genomes and a widespread application model for the study of HSF gene families in plants.
Project description:To explore types, levels and patterns of genetic divergence among diploid Gossypium (cotton) genomes, 780 cDNA, genomic DNA and simple sequence repeat (SSR) loci were re-sequenced in Gossypium herbaceum (A1 genome), G. arboreum (A2), G. raimondii (D5), G. trilobum (D8), G. sturtianum (C1) and an outgroup, Gossypioides kirkii. Divergence among these genomes ranged from 7.32 polymorphic base pairs per 100 between G. kirkii and G. herbaceum (A1) to only 1.44 between G. herbaceum (A1) and G. arboreum (A2). SSR loci are least conserved with 12.71 polymorphic base pairs and 3.77 polymorphic sites per 100 base pairs, whereas expressed sequence tags are most conserved with 3.96 polymorphic base pairs and 2.06 sites. SSR loci also exhibit the highest percentage of 'extended polymorphisms' (spanning multiple consecutive nucleotides). The A genome lineage was particularly rapidly evolving, with the D genome also showing accelerated evolution relative to the C genome. Unexpected asymmetry in mutation rates was found, with much more transition than transversion mutation in the D genome after its divergence from a common ancestor shared with the A genome. This large quantity of orthologous DNA sequence strongly supports a phylogeny in which A-C divergence is more recent than A-D divergence, a subject that is of much importance in view of A-D polyploid formation being key to the evolution of the most productive and finest-quality cottons. Loci that are monomorphic within A or D genome types, but polymorphic between genome types, may be of practical importance for identifying locus-specific DNA markers in tetraploid cottons including leading cultivars.
Project description:Nucleotide binding site (NBS) genes encode a large family of disease resistance (R) proteins in plants. The availability of genomic data of the two diploid cotton species, Gossypium arboreum and Gossypium raimondii, and the two allotetraploid cotton species, Gossypium hirsutum (TM-1) and Gossypium barbadense allow for a more comprehensive and systematic comparative study of NBS-encoding genes to elucidate the mechanisms of cotton disease resistance.Based on the genome assembly data, 246, 365, 588 and 682 NBS-encoding genes were identified in G. arboreum, G. raimondii, G. hirsutum and G. barbadense, respectively. The distribution of NBS-encoding genes among the chromosomes was nonrandom and uneven, and was tended to form clusters. Gene structure analysis showed that G. arboreum and G. hirsutum possessed a greater proportion of CN, CNL, and N genes and a lower proportion of NL, TN and TNL genes compared to that of G. raimondii and G. barbadense, while the percentages of RN and RNL genes remained relatively unchanged. The percentage changes among them were largest for TNL genes, about 7 times. Exon statistics showed that the average exon numbers per NBS gene in G. raimondii and G. barbadense were all greater than that in G. arboretum and G. hirsutum. Phylogenetic analysis revealed that the TIR-NBS genes of G. barbadense were closely related with that of G. raimondii. Sequence similarity analysis showed that diploid cotton G. arboreum possessed a larger proportion of NBS-encoding genes similar to that of allotetraploid cotton G. hirsutum, while diploid G. raimondii possessed a larger proportion of NBS-encoding genes similar to that of allotetraploid cotton G. barbadense. The synteny analysis showed that more NBS genes in G. raimondii and G. arboreum were syntenic with that in G. barbadense and G. hirsutum, respectively.The structural architectures, amino acid sequence similarities and synteny of NBS-encoding genes between G. arboreum and G. hirsutum, and between G. raimondii and G. barbadense were the highest among comparisons between the diploid and allotetraploid genomes, indicating that G. hirsutum inherited more NBS-encoding genes from G. arboreum, while G. barbadense inherited more NBS-encoding genes from G. raimondii. This asymmetric evolution of NBS-encoding genes may help to explain why G. raimondii and G. barbadense are more resistant to Verticillium wilt, whereas G. arboreum and G. hirsutum are more susceptible to Verticillium wilt. The disease resistances of the allotetraploid cotton were related to their NBS-encoding genes especially in regard from which diploid progenitor they were derived, and the TNL genes may have a significant role in disease resistance to Verticillium wilt in G. raimondii and G. barbadense.
Project description:Transposable element (TE) amplification has been recognized as a driving force mediating genome size expansion and evolution, but the consequences for shaping 3D genomic architecture remains largely unknown in plants. Here, we report reference-grade genome assemblies for three species of cotton ranging three-fold in genome size, namely Gossypium rotundifolium (K2), G. arboreum (A2), and G. raimondii (D5), using Oxford Nanopore Technologies. Comparative genome analyses document the details of lineage-specific TE amplification contributing to the large genome size differences (K2, 2.44 Gb; A2, 1.62 Gb; D5, 750.19 Mb), and indicate relatively conserved gene content and synteny relationships among genomes. We found that approximately 17% of syntenic genes exhibit chromatin status change between active ("A") and inactive ("B") compartments, and TE amplification was associated with the increase of the proportion of A compartment in gene regions (∼ 7,000 genes) in K2 and A2 relative to D5. Only 42% of topologically associating domain (TAD) boundaries were conserved among the three genomes. Our data implicate recent amplification of TEs following formation of lineage-specific TAD boundaries. This study sheds light on the role of transposon-mediated genome expansion in the evolution of higher-order chromatin structure in plants.
Project description:Plant non-specific lipid transfer proteins (nsLTPs) are involved in many biological processes. In this study, 51, 47 and 91 nsLTPs were identified in Gossypium arboreum, G. raimondii and their descendant allotetraploid G. hirsutum, respectively. All the nsLTPs were phylogenetically divided into 8 distinct subfamilies. Besides, the recent duplication, which is considered cotton-specific whole genome duplication, may have led to nsLTP expansion in Gossypium. Both tandem and segmental duplication contributed to nsLTP expansion in G. arboreum and G. hirsutum, while tandem duplication was the dominant pattern in G. raimondii. Additionally, the interspecific orthologous gene pairs in Gossypium were identified. Some GaLTPs and GrLTPs lost their orthologs in the A<sub>t</sub> and D<sub>t</sub> subgenomes, respectively, of G. hirsutum. The distribution of these GrLTPs and GaLTPs within each subfamily was complementary, suggesting that the loss and retention of nsLTPs in G. hirsutum might not be random. Moreover, the nsLTPs in the A<sub>t</sub> and D<sub>t</sub> subgenomes might have evolved symmetrically. Furthermore, both intraspecific and interspecific orthologous genes showed considerable expression variation, suggesting that their functions were strongly differentiated. Our results lay an important foundation for expansion and evolutionary analysis of the nsLTP family in Gossypium, and advance nsLTP studies in other plants, especially polyploid plants.
Project description:Verticillium dahliae, a destructive and soil-borne fungal pathogen, causes massive losses in cotton yields. However, the resistance mechanism to V. dahilae in cotton is still poorly understood. Accumulating evidence indicates that chitinases are crucial hydrolytic enzymes, which attack fungal pathogens by catalyzing the fungal cell wall degradation. As a large gene family, to date, the chitinase genes (Chis) have not been systematically analyzed and effectively utilized in cotton. Here, we identified 47, 49, 92, and 116 Chis from four sequenced cotton species, diploid Gossypium raimondii (D5), G. arboreum (A2), tetraploid G. hirsutum acc. TM-1 (AD1), and G. barbadense acc. 3-79 (AD2), respectively. The orthologous genes were not one-to-one correspondence in the diploid and tetraploid cotton species, implying changes in the number of Chis in different cotton species during the evolution of Gossypium. Phylogenetic classification indicated that these Chis could be classified into six groups, with distinguishable structural characteristics. The expression patterns of Chis indicated their various expressions in different organs and tissues, and in the V. dahliae response. Silencing of Chi23, Chi32, or Chi47 in cotton significantly impaired the resistance to V. dahliae, suggesting these genes might act as positive regulators in disease resistance to V. dahliae.
Project description:BACKGROUND:Intergenomic gene transfer (IGT) between nuclear and organellar genomes is a common phenomenon during plant evolution. Gossypium is a useful model to evaluate the genomic consequences of IGT for both diploid and polyploid species. Here, we explore IGT among nuclear, mitochondrial, and plastid genomes of four cotton species, including two allopolyploids and their model diploid progenitors (genome donors, G. arboreum: A2 and G. raimondii: D5). RESULTS:Extensive IGT events exist for both diploid and allotetraploid cotton (Gossypium) species, with the nuclear genome being the predominant recipient of transferred DNA followed by the mitochondrial genome. The nuclear genome has integrated 100 times more foreign sequences than the mitochondrial genome has in total length. In the nucleus, the integrated length of chloroplast DNA (cpDNA) was between 1.87 times (in diploids) to nearly four times (in allopolyploids) greater than that of mitochondrial DNA (mtDNA). In the mitochondrion, the length of nuclear DNA (nuDNA) was typically three times than that of cpDNA. Gossypium mitochondrial genomes integrated three nuclear retrotransposons and eight chloroplast tRNA genes, and incorporated chloroplast DNA prior to divergence between the diploids and allopolyploid formation. For mitochondrial chloroplast-tRNA genes, there were 2-6 bp conserved microhomologies flanking their insertion sites across distantly related genera, which increased to 10 bp microhomologies for the four cotton species studied. For organellar DNA sequences, there are source hotspots, e.g., the atp6-trnW intergenic region in the mitochondrion and the inverted repeat region in the chloroplast. Organellar DNAs in the nucleus were rarely expressed, and at low levels. Surprisingly, there was asymmetry in the survivorship of ancestral insertions following allopolyploidy, with most numts (nuclear mitochondrial insertions) decaying or being lost whereas most nupts (nuclear plastidial insertions) were retained. CONCLUSIONS:This study characterized and compared intracellular transfer among nuclear and organellar genomes within two cultivated allopolyploids and their ancestral diploid cotton species. A striking asymmetry in the fate of IGTs in allopolyploid cotton was discovered, with numts being preferentially lost relative to nupts. Our results connect intergenomic gene transfer with allotetraploidy and provide new insight into intracellular genome evolution.
Project description:Cotton (Gossypium spp.) is the most important natural fiber crop in the world. The R2R3-MYB gene family is a large gene family involved in many plant functions including cotton fiber development. Although previous studies have reported its phylogenetic relationships, gene structures, and expression patterns in tetraploid G. hirsutum and diploid G. raimondii, little is known about the sequence variation of the members between G. hirsutum and G. barbadense and their involvement in the natural quantitative variation in fiber quality and yield. In this study, a comprehensive genome-wide comparative analysis was performed among the four Gossypium species using whole genome sequences, i.e., tetraploid G. hirsutum (AD1) and G. barbadense (AD2) as well as their likely ancestral diploid extants G. raimondii (D5) and G. arboreum (A2), leading to the identification of 406, 393, 216, and 213 R2R3-MYB genes, respectively. To elucidate whether the R2R3-MYB genes are genetically associated with fiber quality traits, 86 R2R3-MYB genes were co-localized with quantitative trait loci (QTL) hotspots for fiber quality and yield, including 42 genes localized within the fiber length QTL hotspots, in interspecific G. hirsutum × G. barbadense populations. There were 20 interspecific nonsynonymous single-nucleotide polymorphism (SNP) sites between the two tetraploid cultivated species, of which 16 developed from 11 R2R3-MYB genes were significantly correlated with fiber quality and yield in a backcross inbred population (BIL) of G. hirsutum × G. barbadense in at least one of the four field tests. Taken together, these results indicate that the sequence variation in these 11 R2R3-MYB genes is associated with the natural variation (i.e., QTL) in fiber quality and yield. Moreover, the functional SNPs of five R2R3-MYB allele pairs from the AD1 and AD2 genomes were significantly correlated with the gene expression related to fiber quality in fiber development. The results will be useful in further elucidating the role of the R2R3-MYB genes during fiber development.