Genome-wide recombination rate variation in a recombination map of cotton.
ABSTRACT: Recombination is crucial for genetic evolution, which not only provides new allele combinations but also influences the biological evolution and efficacy of natural selection. However, recombination variation is not well understood outside of the complex species' genomes, and it is particularly unclear in Gossypium. Cotton is the most important natural fibre crop and the second largest oil-seed crop. Here, we found that the genetic and physical maps distances did not have a simple linear relationship. Recombination rates were unevenly distributed throughout the cotton genome, which showed marked changes along the chromosome lengths and recombination was completely suppressed in the centromeric regions. Recombination rates significantly varied between A-subgenome (At) (range = 1.60 to 3.26 centimorgan/megabase [cM/Mb]) and D-subgenome (Dt) (range = 2.17 to 4.97 cM/Mb), which explained why the genetic maps of At and Dt are similar but the physical map of Dt is only half that of At. The translocation regions between A02 and A03 and between A04 and A05, and the inversion regions on A10, D10, A07 and D07 indicated relatively high recombination rates in the distal regions of the chromosomes. Recombination rates were positively correlated with the densities of genes, markers and the distance from the centromere, and negatively correlated with transposable elements (TEs). The gene ontology (GO) categories showed that genes in high recombination regions may tend to response to environmental stimuli, and genes in low recombination regions are related to mitosis and meiosis, which suggested that they may provide the primary driving force in adaptive evolution and assure the stability of basic cell cycle in a rapidly changing environment. Global knowledge of recombination rates will facilitate genetics and breeding in cotton.
Project description:Genetic linkage maps play fundamental roles in understanding genome structure, explaining genome formation events during evolution, and discovering the genetic bases of important traits. A high-density cotton (Gossypium spp.) genetic map was developed using representative sets of simple sequence repeat (SSR) and the first public set of single nucleotide polymorphism (SNP) markers to genotype 186 recombinant inbred lines (RILs) derived from an interspecific cross between Gossypium hirsutum L. (TM-1) and G. barbadense L. (3-79). The genetic map comprised 2072 loci (1825 SSRs and 247 SNPs) and covered 3380 centiMorgan (cM) of the cotton genome (AD) with an average marker interval of 1.63 cM. The allotetraploid cotton genome produced equivalent recombination frequencies in its two subgenomes (At and Dt). Of the 2072 loci, 1138 (54.9%) were mapped to 13 At-subgenome chromosomes, covering 1726.8 cM (51.1%), and 934 (45.1%) mapped to 13 Dt-subgenome chromosomes, covering 1653.1 cM (48.9%). The genetically smallest homeologous chromosome pair was Chr. 04 (A04) and 22 (D04), and the largest was Chr. 05 (A05) and 19 (D05). Duplicate loci between and within homeologous chromosomes were identified that facilitate investigations of chromosome translocations. The map augments evidence of reciprocal rearrangement between ancestral forms of Chr. 02 and 03 versus segmental homeologs 14 and 17 as centromeric regions show homeologous between Chr. 02 (A02) and 17 (D02), as well as between Chr. 03 (A03) and 14 (D03). This research represents an important foundation for studies on polyploid cottons, including germplasm characterization, gene discovery, and genome sequence assembly.
Project description:Cotton (Gossypium spp.) is an important crop plant that is widely grown to produce both natural textile fibers and cottonseed oil. Cotton fibers, the economically more important product of the cotton plant, are seed trichomes derived from individual cells of the epidermal layer of the seed coat. It has been known for a long time that large numbers of genes determine the development of cotton fiber, and more recently it has been determined that these genes are distributed across At and Dt subgenomes of tetraploid AD cottons. In the present study, the organization and evolution of the fiber development genes were investigated through the construction of an integrated genetic and physical map of fiber development genes whose functions have been verified and confirmed. A total of 535 cotton fiber development genes, including 103 fiber transcription factors, 259 fiber development genes, and 173 SSR-contained fiber ESTs, were analyzed at the subgenome level. A total of 499 fiber related contigs were selected and assembled. Together these contigs covered about 151 Mb in physical length, or about 6.7% of the tetraploid cotton genome. Among the 499 contigs, 397 were anchored onto individual chromosomes. Results from our studies on the distribution patterns of the fiber development genes and transcription factors between the At and Dt subgenomes showed that more transcription factors were from Dt subgenome than At, whereas more fiber development genes were from At subgenome than Dt. Combining our mapping results with previous reports that more fiber QTLs were mapped in Dt subgenome than At subgenome, the results suggested a new functional hypothesis for tetraploid cotton. After the merging of the two diploid Gossypium genomes, the At subgenome has provided most of the genes for fiber development, because it continues to function similar to its fiber producing diploid A genome ancestor. On the other hand, the Dt subgenome, with its non-fiber producing D genome ancestor, provides more transcription factors that regulate the expression of the fiber genes in the At subgenome. This hypothesis would explain previously published mapping results. At the same time, this integrated map of fiber development genes would provide a framework to clone individual full-length fiber genes, to elucidate the physiological mechanisms of the fiber differentiation, elongation, and maturation, and to systematically study the functional network of these genes that interact during the process of fiber development in the tetraploid cottons.
Project description:Flowering time is an important ecological trait that determines the transition from vegetative to reproductive growth. Flowering time in cotton is controlled by short-day photoperiods, with strict photoperiod sensitivity. As the CO-FT (CONSTANS-FLOWER LOCUS T) module regulates photoperiodic flowering in several plants, we selected eight CONSTANS genes (COL) in group I to detect their expression patterns in long-day and short-day conditions. Further, we individually cloned and sequenced their homologs from 25 different cotton accessions and one outgroup. Finally, we studied their structures, phylogenetic relationship, and molecular evolution in both coding region and three characteristic domains. All the eight COLs in group I show diurnal expression. In the orthologous and homeologous loci, each gene structure in different cotton species is highly conserved, while length variation has occurred due to insertions/deletions in intron and/or exon regions. Six genes, COL2 to COL5, COL7 and COL8, exhibit higher nucleotide diversity in the D-subgenome than in the A-subgenome. The Ks values of 98.37% in all allotetraploid cotton species examined were higher in the A-D and At-Dt comparison than in the A-At and D-Dt comparisons, and the Pearson's correlation coefficient (r) of Ks between A vs. D and At vs. Dt also showed positive, high correlations, with a correlation coefficient of at least 0.797. The nucleotide polymorphism in wild species is significantly higher compared to G. hirsutum and G. barbadense, indicating a genetic bottleneck associated with the domesticated cotton species. Three characteristic domains in eight COLs exhibit different evolutionary rates, with the CCT domain highly conserved, while the B-box and Var domain much more variable in allotetraploid species. Taken together, COL1, COL2 and COL8 endured greater selective pressures during the domestication process. The study improves our understanding of the domestication-related genes/traits during cotton evolutionary process.
Project description:Next generation sequencing (RNA-seq) technology was used to evaluate the effects of the Ligon lintless-2 (Li2) short fiber mutation on transcriptomes of both subgenomes of allotetraploid cotton (Gossypium hirsutum L.) as compared to its near-isogenic wild type. Sequencing was performed on 4 libraries from developing fibers of Li2 mutant and wild type near-isogenic lines at the peak of elongation followed by mapping and PolyCat categorization of RNA-seq data to the reference D5 genome (G. raimondii) for homeologous gene expression analysis. The majority of homeologous genes, 83.6% according to the reference genome, were expressed during fiber elongation. Our results revealed: 1) approximately two times more genes were induced in the AT subgenome comparing to the DT subgenome in wild type and mutant fiber; 2) the subgenome expression bias was significantly reduced in the Li2 fiber transcriptome; 3) Li2 had a significantly greater effect on the DT than on the AT subgenome. Transcriptional regulators and cell wall homeologous genes significantly affected by the Li2 mutation were reviewed in detail. This is the first report to explore the effects of a single mutation on homeologous gene expression in allotetraploid cotton. These results provide deeper insights into the evolution of allotetraploid cotton gene expression and cotton fiber development.
Project description:SNPs are the most abundant polymorphism type, and have been explored in many crop genomic studies, including rice and maize. SNP discovery in allotetraploid cotton genomes has lagged behind that of other crops due to their complexity and polyploidy. In this study, genome-wide SNPs are detected systematically using next-generation sequencing and efficient SNP genotyping methods, and used to construct a linkage map and characterize the structural variations in polyploid cotton genomes.We construct an ultra-dense inter-specific genetic map comprising 4,999,048 SNP loci distributed unevenly in 26 allotetraploid cotton linkage groups and covering 4,042 cM. The map is used to order tetraploid cotton genome scaffolds for accurate assembly of G. hirsutum acc. TM-1. Recombination rates and hotspots are identified across the cotton genome by comparing the assembled draft sequence and the genetic map. Using this map, genome rearrangements and centromeric regions are identified in tetraploid cotton by combining information from the publicly-available G. raimondii genome with fluorescent in situ hybridization analysis.We report the genotype-by-sequencing method used to identify millions of SNPs between G. hirsutum and G. barbadense. We construct and use an ultra-dense SNP map to correct sequence mis-assemblies, merge scaffolds into pseudomolecules corresponding to chromosomes, detect genome rearrangements, and identify centromeric regions in allotetraploid cottons. We find that the centromeric retro-element sequence of tetraploid cotton derived from the D subgenome progenitor might have invaded the A subgenome centromeres after allotetrapolyploid formation. This study serves as a valuable genomic resource for genetic research and breeding of cotton.
Project description:Reactive oxygen species (ROS) are important molecules in the plant, which are involved in many biological processes, including fiber development and adaptation to abiotic stress in cotton. We carried out transcription analysis to determine the evolution of the ROS genes and analyzed their expression levels in various tissues of cotton plant under abiotic stress conditions. There were 515, 260, and 261 genes of ROS network that were identified in Gossypium hirsutum (AD? genome), G. arboreum (A genome), and G. raimondii (D genome), respectively. The ROS network genes were found to be distributed in all the cotton chromosomes, but with a tendency of aggregating on either the lower or upper arms of the chromosomes. Moreover, all the cotton ROS network genes were grouped into 17 families as per the phylogenetic tress analysis. A total of 243 gene pairs were orthologous in G. arboreum and G. raimondii. There were 240 gene pairs that were orthologous in G. arboreum, G. raimondii, and G. hirsutum. The synonymous substitution value (Ks) peaks of orthologous gene pairs between the At subgenome and the A progenitor genome (G. arboreum), D subgenome and D progenitor genome (G. raimondii) were 0.004 and 0.015, respectively. The Ks peaks of ROS network orthologous gene pairs between the two progenitor genomes (A and D genomes) and two subgenomes (At and Dt subgenome) were 0.045. The majority of Ka/Ks value of orthologous gene pairs between the A, D genomes and two subgenomes of TM-1 were lower than 1.0. RNA seq. analysis and RT-qPCR validation, showed that, CSD1,2,3,5,6; FSD1,2; MSD1,2; APX3,11; FRO5.6; and RBOH6 played a major role in fiber development while CSD1, APX1, APX2, MDAR1, GPX4-6-7, FER2, RBOH6, RBOH11, and FRO5 were integral for enhancing salt stress in cotton. ROS network-mediated signal pathway enhances the mechanism of fiber development and regulation of abiotic stress in Gossypium. This study will enhance the understanding of ROS network and form the basic foundation in exploring the mechanism of ROS network-involving the fiber development and regulation of abiotic stress in cotton.
Project description:A high-resolution genetic map is a useful tool for assaying genomic structural variation and clarifying the evolution of polyploid cotton. A total of 36956 SSRs, including 11289 released in previous studies and 25567 which were newly developed based on the genome sequences of G. arboreum and G. raimondii, were utilized to construct a new genetic map. The new high-density genetic map includes 6009 loci and spanned 3863.97?cM with an average distance of 0.64?cM between consecutive markers. Four inversions (one between Chr08 and Chr24, one between Chr09 and Chr23 and two between Chr10 and Chr20) were identified by homology analysis. Comparative genomic analysis between genetic map and two diploid cottons showed that structural variations between the A genome and At subgenome are more extensive than between D genome and Dt subgenome. A total of 17 inversions, seven simple translocations and two reciprocal translocations were identified between genetic map and G. raimondii. Good colinearity was revealed between the corresponding chromosomes of tetraploid G. hirsutum and G. barbadense genomes, but a total of 16 inversions were detected between them. These results will accelerate the process of evolution analysis of Gossipium genus.
Project description:Diverse leaf morphology has been observed among accessions of Gossypium hirsutum, including okra leaf, which has advantages and disadvantages in cotton production. The okra leaf locus has been mapped to chromosome 15 of the Dt subgenome, but the underlying gene has yet to be identified. In this study, we used a combination of targeted association analysis, F2 population-based fine mapping, and comparative sequencing of orthologues to identify a candidate gene underlying the okra leaf trait in G. hirsutum. The okra leaf gene identified, GhOKRA, encoded a homeodomain leucine-zipper class I protein, whose closely related genes in several other plant species have been shown to be involved in regulating leaf morphology. The transcript levels of GhOKRA in shoot apices were positively correlated with the phenotypic expression of the okra leaf trait. Of the multiple sequence variations observed in the coding region among GrOKRA of Gossypium raimondii and GhOKRA-Dt of normal and okra/superokra leaf G. hirsutum accessions, a non-synonymous substitution near the N terminus and the variable protein sequences at the C terminus may be related to the leaf shape difference. Our results suggest that both transcription and protein activity of GhOKRA may be involved in regulating leaf shape. Furthermore, we found that non-reciprocal homoeologous recombination, or gene conversion, may have played a role in the origin of the okra leaf allele. Our results provided tools for further investigating and understanding the fundamental biological processes that are responsible for the cotton leaf shape variation and will help in the design of cotton plants with an ideal leaf shape for enhanced cotton production.
Project description:BACKGROUND:Fructose-1,6-bisphosphatase (FBP) is a key enzyme in the plant sucrose synthesis pathway, in the Calvin cycle, and plays an important role in photosynthesis regulation in green plants. However, no systemic analysis of FBPs has been reported in Gossypium species. RESULTS:A total of 41 FBP genes from four Gossypium species were identified and analyzed. These FBP genes were sorted into two groups and seven subgroups. Results revealed that FBP family genes were under purifying selection pressure that rendered FBP family members as being conserved evolutionarily, and there was no tandem or fragmental DNA duplication in FBP family genes. Collinearity analysis revealed that a FBP gene was located in a translocated DNA fragment and the whole FBP gene family was under disequilibrium evolution that led to a faster evolutionary progress of the members in G. barbadense and in At subgenome than those in other Gossypium species and in the Dt subgenome, respectively, in this study. Through RNA-seq analyses and qRT-PCR verification, different FBP genes had diversified biological functions in cotton fiber development (two genes in 0 DPA and 1DPA ovules and four genes in 20-25 DPA fibers), in plant responses to Verticillium wilt onset (two genes) and to salt stress (eight genes). CONCLUSION:The FBP gene family displayed a disequilibrium evolution pattern in Gossypium species, which led to diversified functions affecting not only fiber development, but also responses to Verticillium wilt and salt stress. All of these findings provide the foundation for further study of the function of FBP genes in cotton fiber development and in environmental adaptability.
Project description:The spread of cotton leaf curl disease in China, India and Pakistan is a recent phenomenon. Analysis of available sequence data determined that there is a substantial diversity of cotton-infecting geminiviruses in Pakistan. Phylogenetic analyses indicated that recombination between two major groups of viruses, cotton leaf curl Multan virus (CLCuMuV) and cotton leaf curl Kokhran virus (CLCuKoV), led to the emergence of several new viruses. Recombination detection programs and phylogenetic analyses showed that CLCuMuV and CLCuKoV are highly recombinant viruses. Indeed, CLCuKoV appeared to be a major donor virus for the coat protein (CP) gene, while CLCuMuV donated the Rep gene in the majority of recombination events. Using recombination free nucleotide datasets the substitution rates for CP and Rep genes were determined. We inferred similar nucleotide substitution rates for the CLCuMuV-Rep gene (4.96X10-4) and CLCuKoV-CP gene (2.706X10-4), whereas relatively higher substitution rates were observed for CLCuMuV-CP and CLCuKoV-Rep genes. The combination of sequences with equal and relatively low substitution rates, seemed to result in the emergence of viral isolates that caused epidemics in Pakistan and India. Our findings also suggest that CLCuMuV is spreading at an alarming rate, which can potentially be a threat to cotton production in the Indian subcontinent.