An integrated genetic and physical map of homoeologous chromosomes 12 and 26 in Upland cotton (G. hirsutum L.).
ABSTRACT: BACKGROUND: Upland cotton (G. hirsutum L.) is the leading fiber crop worldwide. Genetic improvement of fiber quality and yield is facilitated by a variety of genomics tools. An integrated genetic and physical map is needed to better characterize quantitative trait loci and to allow for the positional cloning of valuable genes. However, developing integrated genomic tools for complex allotetraploid genomes, like that of cotton, is highly experimental. In this report, we describe an effective approach for developing an integrated physical framework that allows for the distinguishing between subgenomes in cotton. RESULTS: A physical map has been developed with 220 and 115 BAC contigs for homeologous chromosomes 12 and 26, respectively, covering 73.49 Mb and 34.23 Mb in physical length. Approximately one half of the 220 contigs were anchored to the At subgenome only, while 48 of the 115 contigs were allocated to the Dt subgenome only. Between the two chromosomes, 67 contigs were shared with an estimated overall physical similarity between the two chromosomal homeologs at 40.0 %. A total of 401 fiber unigenes plus 214 non-fiber unigenes were located to chromosome 12 while 207 fiber unigenes plus 183 non-fiber unigenes were allocated to chromosome 26. Anchoring was done through an overgo hybridization approach and all anchored ESTs were functionally annotated via blast analysis. CONCLUSION: This integrated genomic map describes the first pair of homoeologous chromosomes of an allotetraploid genome in which BAC contigs were identified and partially separated through the use of chromosome-specific probes and locus-specific genetic markers. The approach used in this study should prove useful in the construction of genome-wide physical maps for polyploid plant genomes including Upland cotton. The identification of Gene-rich islands in the integrated map provides a platform for positional cloning of important genes and the targeted sequencing of specific genomic regions.
Project description:Cotton (Gossypium spp.) is an important crop plant that is widely grown to produce both natural textile fibers and cottonseed oil. Cotton fibers, the economically more important product of the cotton plant, are seed trichomes derived from individual cells of the epidermal layer of the seed coat. It has been known for a long time that large numbers of genes determine the development of cotton fiber, and more recently it has been determined that these genes are distributed across At and Dt subgenomes of tetraploid AD cottons. In the present study, the organization and evolution of the fiber development genes were investigated through the construction of an integrated genetic and physical map of fiber development genes whose functions have been verified and confirmed. A total of 535 cotton fiber development genes, including 103 fiber transcription factors, 259 fiber development genes, and 173 SSR-contained fiber ESTs, were analyzed at the subgenome level. A total of 499 fiber related contigs were selected and assembled. Together these contigs covered about 151 Mb in physical length, or about 6.7% of the tetraploid cotton genome. Among the 499 contigs, 397 were anchored onto individual chromosomes. Results from our studies on the distribution patterns of the fiber development genes and transcription factors between the At and Dt subgenomes showed that more transcription factors were from Dt subgenome than At, whereas more fiber development genes were from At subgenome than Dt. Combining our mapping results with previous reports that more fiber QTLs were mapped in Dt subgenome than At subgenome, the results suggested a new functional hypothesis for tetraploid cotton. After the merging of the two diploid Gossypium genomes, the At subgenome has provided most of the genes for fiber development, because it continues to function similar to its fiber producing diploid A genome ancestor. On the other hand, the Dt subgenome, with its non-fiber producing D genome ancestor, provides more transcription factors that regulate the expression of the fiber genes in the At subgenome. This hypothesis would explain previously published mapping results. At the same time, this integrated map of fiber development genes would provide a framework to clone individual full-length fiber genes, to elucidate the physiological mechanisms of the fiber differentiation, elongation, and maturation, and to systematically study the functional network of these genes that interact during the process of fiber development in the tetraploid cottons.
Project description:Polyploids account for approximately 70% of flowering plants, including many field, horticulture and forage crops. Cottons are a world-leading fiber and important oilseed crop, and a model species for study of plant polyploidization, cellulose biosynthesis and cell wall biogenesis. This study has addressed the concerns of physical mapping of polyploids with BACs and/or BIBACs by constructing a physical map of the tetraploid cotton, Gossypium hirsutum L. The physical map consists of 3,450 BIBAC contigs with an N50 contig size of 863 kb, collectively spanning 2,244 Mb. We sorted the map contigs according to their origin of subgenome, showing that we assembled physical maps for the A- and D-subgenomes of the tetraploid cotton, separately. We also identified the BIBACs in the map minimal tilling path, which consists of 15,277 clones. Moreover, we have marked the physical map with nearly 10,000 BIBAC ends (BESs), making one BES in approximately 250 kb. This physical map provides a line of evidence and a strategy for physical mapping of polyploids, and a platform for advanced research of the tetraploid cotton genome, particularly fine mapping and cloning the cotton agronomic genes and QTLs, and sequencing and assembling the cotton genome using the modern next-generation sequencing technology.
Project description:Next generation sequencing (RNA-seq) technology was used to evaluate the effects of the Ligon lintless-2 (Li2) short fiber mutation on transcriptomes of both subgenomes of allotetraploid cotton (Gossypium hirsutum L.) as compared to its near-isogenic wild type. Sequencing was performed on 4 libraries from developing fibers of Li2 mutant and wild type near-isogenic lines at the peak of elongation followed by mapping and PolyCat categorization of RNA-seq data to the reference D5 genome (G. raimondii) for homeologous gene expression analysis. The majority of homeologous genes, 83.6% according to the reference genome, were expressed during fiber elongation. Our results revealed: 1) approximately two times more genes were induced in the AT subgenome comparing to the DT subgenome in wild type and mutant fiber; 2) the subgenome expression bias was significantly reduced in the Li2 fiber transcriptome; 3) Li2 had a significantly greater effect on the DT than on the AT subgenome. Transcriptional regulators and cell wall homeologous genes significantly affected by the Li2 mutation were reviewed in detail. This is the first report to explore the effects of a single mutation on homeologous gene expression in allotetraploid cotton. These results provide deeper insights into the evolution of allotetraploid cotton gene expression and cotton fiber development.
Project description:BACKGROUND: Upland cotton has the highest yield, and accounts for > 95% of world cotton production. Decoding upland cotton genomes will undoubtedly provide the ultimate reference and resource for structural, functional, and evolutionary studies of the species. Here, we employed GeneTrek and BAC tagging information approaches to predict the general composition and structure of the allotetraploid cotton genome. RESULTS: 142 BAC sequences from Gossypium hirsutum cv. Maxxa were downloaded http://www.ncbi.nlm.nih.gov and confirmed. These BAC sequence analysis revealed that the tetraploid cotton genome contains over 70,000 candidate genes with duplicated gene copies in homoeologous A- and D-subgenome regions. Gene distribution is uneven, with gene-rich and gene-free regions of the genome. Twenty-one percent of the 142 BACs lacked genes. BAC gene density ranged from 0 to 33.2 per 100 kb, whereas most gene islands contained only one gene with an average of 1.5 genes per island. Retro-elements were found to be a major component, first an enriched LTR/gypsy and second LTR/copia. Most LTR retrotransposons were truncated and in nested structures. In addition, 166 polymorphic loci amplified with SSRs developed from 70 BAC clones were tagged on our backbone genetic map. Seventy-five percent (125/166) of the polymorphic loci were tagged on the D-subgenome. By comprehensively analyzing the molecular size of amplified products among tetraploid G. hirsutum cv. Maxxa, acc. TM-1, and G. barbadense cv. Hai7124, and diploid G. herbaceum var. africanum and G. raimondii, 37 BACs, 12 from the A- and 25 from the D-subgenome, were further anchored to their corresponding subgenome chromosomes. After a large amount of genes sequence comparison from different subgenome BACs, the result showed that introns might have no contribution to different subgenome size in Gossypium. CONCLUSION: This study provides us with the first glimpse of cotton genome complexity and serves as a foundation for tetraploid cotton whole genomesequencing in the future.
Project description:Like those of many agricultural crops, the cultivated cotton is an allotetraploid and has a large genome (~2.5 gigabase pairs). The two sub genomes, A and D, are highly similar but unequally sized and repeat-rich, which pose significant challenges for accurate genome reconstruction using standard approaches. Here we report the development of BAC libraries, sub genome specific physical maps, and a new-generation sequencing approach that will lead to a reference-grade genome assembly for Upland cotton. Three BAC libraries were constructed, fingerprinted, and integrated with BAC-end sequences (BES) to produce a de novo whole-genome physical map. The BAC map was partitioned by sub genomes through alignment to the diploid progenitor D-genome reference sequence with densely spaced BES anchor points and computational filtering. The physical maps were validated with FISH and genetic mapping of SNP markers derived from BES. Two pairs of homeologous chromosomes, A11/D11 and A12/D12, were used to assess multiplex sequencing approaches for completeness and scalability. The results represent the first sub genome anchored physical maps of Upland cotton, and a new-generation approach to the whole-genome sequencing, which will lead to the reference-grade assembly of allopolyploid cotton and serve as a general strategy for sequencing other polyploid species.
Project description:SNPs are the most abundant polymorphism type, and have been explored in many crop genomic studies, including rice and maize. SNP discovery in allotetraploid cotton genomes has lagged behind that of other crops due to their complexity and polyploidy. In this study, genome-wide SNPs are detected systematically using next-generation sequencing and efficient SNP genotyping methods, and used to construct a linkage map and characterize the structural variations in polyploid cotton genomes.We construct an ultra-dense inter-specific genetic map comprising 4,999,048 SNP loci distributed unevenly in 26 allotetraploid cotton linkage groups and covering 4,042 cM. The map is used to order tetraploid cotton genome scaffolds for accurate assembly of G. hirsutum acc. TM-1. Recombination rates and hotspots are identified across the cotton genome by comparing the assembled draft sequence and the genetic map. Using this map, genome rearrangements and centromeric regions are identified in tetraploid cotton by combining information from the publicly-available G. raimondii genome with fluorescent in situ hybridization analysis.We report the genotype-by-sequencing method used to identify millions of SNPs between G. hirsutum and G. barbadense. We construct and use an ultra-dense SNP map to correct sequence mis-assemblies, merge scaffolds into pseudomolecules corresponding to chromosomes, detect genome rearrangements, and identify centromeric regions in allotetraploid cottons. We find that the centromeric retro-element sequence of tetraploid cotton derived from the D subgenome progenitor might have invaded the A subgenome centromeres after allotetrapolyploid formation. This study serves as a valuable genomic resource for genetic research and breeding of cotton.
Project description:Upland cotton (Gossypium hirsutum L., 2n = 52, AADD) is an allotetraploid, therefore the discovery of single nucleotide polymorphism (SNP) markers is difficult. The recent emergence of genome complexity reduction technologies based on the next-generation sequencing (NGS) platform has greatly expedited SNP discovery in crops with highly repetitive and complex genomes. Here we applied restriction-site associated DNA (RAD) sequencing technology for de novo SNP discovery in allotetraploid cotton. We identified 21,109 SNPs between the two parents and used these for genotyping of 161 recombinant inbred lines (RILs). Finally, a high dense linkage map comprising 4,153 loci over 3500-cM was developed based on the previous result. Using this map quantitative trait locus (QTLs) conferring fiber strength and Verticillium Wilt (VW) resistance were mapped to a more accurate region in comparison to the 1576-cM interval determined using the simple sequence repeat (SSR) genetic map. This suggests that the newly constructed map has more power and resolution than the previous SSR map. It will pave the way for the rapid identification of the marker-assisted selection in cotton breeding and cloning of QTL of interest traits.
Project description:BACKGROUND:Improving fiber quality and yield are the primary research objectives in cotton breeding for enhancing the economic viability and sustainability of Upland cotton production. Identifying the quantitative trait loci (QTL) for fiber quality and yield traits using the high-density SNP-based genetic maps allows for bridging genomics with cotton breeding through marker assisted and genomic selection. In this study, a recombinant inbred line (RIL) population, derived from cross between two parental accessions, which represent broad allele diversity in Upland cotton, was used to construct high-density SNP-based linkage maps and to map the QTLs controlling important cotton traits. RESULTS:Molecular genetic mapping using RIL population produced a genetic map of 3129 SNPs, mapped at a density of 1.41?cM. Genetic maps of the individual chromosomes showed good collinearity with the sequence based physical map. A total of 106 QTLs were identified which included 59 QTLs for six fiber quality traits, 38 QTLs for four yield traits and 9 QTLs for two morphological traits. Sub-genome wide, 57 QTLs were mapped in A sub-genome and 49 were mapped in D sub-genome. More than 75% of the QTLs with favorable alleles were contributed by the parental accession NC05AZ06. Forty-six mapped QTLs each explained more than 10% of the phenotypic variation. Further, we identified 21 QTL clusters where 12 QTL clusters were mapped in the A sub-genome and 9 were mapped in the D sub-genome. Candidate gene analyses of the 11 stable QTL harboring genomic regions identified 19 putative genes which had functional role in cotton fiber development. CONCLUSION:We constructed a high-density genetic map of SNPs in Upland cotton. Collinearity between genetic and physical maps indicated no major structural changes in the genetic mapping populations. Most traits showed high broad-sense heritability. One hundred and six QTLs were identified for the fiber quality, yield and morphological traits. Majority of the QTLs with favorable alleles were contributed by improved parental accession. More than 70% of the mapped QTLs shared the similar map position with previously reported QTLs which suggest the genetic relatedness of Upland cotton germplasm. Identification of QTL clusters could explain the correlation among some fiber quality traits in cotton. Stable and major QTLs and QTL clusters of traits identified in the current study could be the targets for map-based cloning and marker assisted selection (MAS) in cotton breeding. The genomic region on D12 containing the major stable QTLs for micronaire, fiber strength and lint percentage could be potential targets for MAS and gene cloning of fiber quality traits in cotton.
Project description:BACKGROUND: Cotton is the world's most important natural textile fiber and a significant oilseed crop. Decoding cotton genomes will provide the ultimate reference and resource for research and utilization of the species. Integration of high-density genetic maps with genomic sequence information will largely accelerate the process of whole-genome assembly in cotton. RESULTS: In this paper, we update a high-density interspecific genetic linkage map of allotetraploid cultivated cotton. An additional 1,167 marker loci have been added to our previously published map of 2,247 loci. Three new marker types, InDel (insertion-deletion) and SNP (single nucleotide polymorphism) developed from gene information, and REMAP (retrotransposon-microsatellite amplified polymorphism), were used to increase map density. The updated map consists of 3,414 loci in 26 linkage groups covering 3,667.62 cM with an average inter-locus distance of 1.08 cM. Furthermore, genome-wide sequence analysis was finished using 3,324 informative sequence-based markers and publicly-available Gossypium DNA sequence information. A total of 413,113 EST and 195 BAC sequences were physically anchored and clustered by 3,324 sequence-based markers. Of these, 14,243 ESTs and 188 BACs from different species of Gossypium were clustered and specifically anchored to the high-density genetic map. A total of 2,748 candidate unigenes from 2,111 ESTs clusters and 63 BACs were mined for functional annotation and classification. The 337 ESTs/genes related to fiber quality traits were integrated with 132 previously reported cotton fiber quality quantitative trait loci, which demonstrated the important roles in fiber quality of these genes. Higher-level sequence conservation between different cotton species and between the A- and D-subgenomes in tetraploid cotton was found, indicating a common evolutionary origin for orthologous and paralogous loci in Gossypium. CONCLUSION: This study will serve as a valuable genomic resource for tetraploid cotton genome assembly, for cloning genes related to superior agronomic traits, and for further comparative genomic analyses in Gossypium.
Project description:Plant JAZ (Jasmonate ZIM-domain) proteins play versatile roles in multiple aspects of plant development and defense. However, little is known about the JAZ family in allotetraploid upland cotton (Gossypium hirsutum) so far. In this study, 30 non-redundant JAZ genes were identified in upland cotton through genome-wide screening. Phylogenetic analysis revealed that the 30 proteins in cotton JAZ family are further divided into five groups (I - V), and members in the same group share highly conserved motif structures. Subcellular localization assay demonstrated that GhJAZ proteins are localized in the cell nucleus. Quantitative RT-PCR analysis indicated that GhJAZs display different expression patterns in cotton tissues, and most of them could be induced by Jasmonic (JA). Furthermore, some GhJAZ genes are preferentially expressed in cotton ovules and fibers, and showed differential expression in ovules of wild type cotton and fiberless mutant (fl) during fiber initiation. GhJAZ proteins could interact with each other to form homodimer or heterodimer, and they also interacted with some JA signaling regulators and the proteins involved in cotton fiber initiation. Collectively, our data suggested that some GhJAZ proteins may play important roles in cotton fiber initiation and development by regulating JA signaling as well as some fiber-related proteins.