A preliminary analysis of genome structure and composition in Gossypium hirsutum.
ABSTRACT: BACKGROUND: Upland cotton has the highest yield, and accounts for > 95% of world cotton production. Decoding upland cotton genomes will undoubtedly provide the ultimate reference and resource for structural, functional, and evolutionary studies of the species. Here, we employed GeneTrek and BAC tagging information approaches to predict the general composition and structure of the allotetraploid cotton genome. RESULTS: 142 BAC sequences from Gossypium hirsutum cv. Maxxa were downloaded http://www.ncbi.nlm.nih.gov and confirmed. These BAC sequence analysis revealed that the tetraploid cotton genome contains over 70,000 candidate genes with duplicated gene copies in homoeologous A- and D-subgenome regions. Gene distribution is uneven, with gene-rich and gene-free regions of the genome. Twenty-one percent of the 142 BACs lacked genes. BAC gene density ranged from 0 to 33.2 per 100 kb, whereas most gene islands contained only one gene with an average of 1.5 genes per island. Retro-elements were found to be a major component, first an enriched LTR/gypsy and second LTR/copia. Most LTR retrotransposons were truncated and in nested structures. In addition, 166 polymorphic loci amplified with SSRs developed from 70 BAC clones were tagged on our backbone genetic map. Seventy-five percent (125/166) of the polymorphic loci were tagged on the D-subgenome. By comprehensively analyzing the molecular size of amplified products among tetraploid G. hirsutum cv. Maxxa, acc. TM-1, and G. barbadense cv. Hai7124, and diploid G. herbaceum var. africanum and G. raimondii, 37 BACs, 12 from the A- and 25 from the D-subgenome, were further anchored to their corresponding subgenome chromosomes. After a large amount of genes sequence comparison from different subgenome BACs, the result showed that introns might have no contribution to different subgenome size in Gossypium. CONCLUSION: This study provides us with the first glimpse of cotton genome complexity and serves as a foundation for tetraploid cotton whole genomesequencing in the future.
Project description:Genetic and physical framework mapping in cotton (Gossypium spp.) were used to discover putative gene sequences involved in resistance to common soil-borne pathogens. Chromosome (Chr) 11 and its homoeologous Chr 21 of Upland cotton (G. hirsutum) are foci for discovery of resistance (R) or pathogen-induced R (PR) genes underlying QTLs involved in response to root-knot nematode (Meloidogyne incognita), reniform nematode (Rotylenchulus reniformis), Fusarium wilt (Fusarium oxysporum f.sp. vasinfectum), Verticillium wilt (Verticillium dahliae), and black root rot (Thielaviopsis basicola). Simple sequence repeat (SSR) markers and bacterial artificial chromosome (BAC) clones from a BAC library developed from the Upland cotton Acala Maxxa were mapped on Chr 11 and Chr 21. DNA sequence through Gene Ontology (GO) of 99 of 256 Chr 11 and 109 of 239 Chr 21 previously mapped SSRs revealed response elements to internal and external stimulus, stress, signaling process, and cell death. The reconciliation between genetic and physical mapping of gene annotations from new DNA sequences of 20 BAC clones revealed 467 (Chr 11) and 285 (Chr 21) G. hirsutum putative coding sequences, plus 146 (Chr 11) and 98 (Chr 21) predicted genes. GO functional profiling of Unigenes uncovered genes involved in different metabolic functions and stress response elements (SRE). Our results revealed that Chrs 11 and 21 harbor resistance gene rich genomic regions. Sequence comparisons with the ancestral diploid D5 (G. raimondii), A2 (G. arboreum) and domesticated tetraploid TM-1 AD1 (G. hirsutum) genomes revealed abundance of transposable elements and confirmed the richness of resistance gene motifs in these chromosomes. The sequence information of SSR markers and BAC clones and the genetic mapping of BAC clones provide enhanced genetic and physical frameworks of resistance gene-rich regions of the cotton genome, thereby aiding discovery of R and PR genes and breeding for resistance to cotton diseases.
Project description:BACKGROUND: Cotton, as an allopolyploid species, contains homoeologous A and D subgenomes. The study of the homoeologous (duplicated) segments or chromosomes can facilitate insight into the evolutionary process of polyploidy and the development of genomic resources. Fluorescence in situ hybridization (FISH) using bacterial artificial chromosome (BAC) clones as probes has commonly been used to provide a reliable cytological technique for chromosome identification. In polyploids, it also presents a useful approach for identification and localization of duplicated segments. Here, two types of BACs that contained the duplicated segments were isolated and analyzed in tetraploid cotton by FISH. RESULTS: Homologous and homoeologous BACs were isolated by way of SSR marker-based selection and then used to develop BAC-FISH probes. Duplicated segments in homoeologous chromosomes were detected by FISH. The FISH and related linkage map results followed known reinforced the relationships of homoeologous chromosomes in allotetraploid cotton, and presented a useful approach for isolation of homoeologous loci or segments and for mapping of monomorphic loci. It is very important to find that the large duplicated segments (homologous BACs) do exist between homoeologous chromosomes, so the shot-gun approach for genome sequencing was unavailable for tetraploid cotton. However, without doubt, it will contain more information and promote the research for duplicated segments as well as the genome evolution in cotton. CONCLUSION: These findings and the analysis method by BAC-FISH demonstrated the powerful nature and wide use for the genome and genome evolutionary researches in cotton and other polyploidy species.
Project description:Long terminal repeat (LTR) retrotransposon is the most abundant DNA component and is largely responsible for plant genome size variation. Although it has been studied in plant species, very limited data is available for cotton, the most important fiber and texture crop. In this study, we performed a comprehensive analysis of LTR retrotransposon families across four cotton species. In tetraploid Gossypium species, LTR retrotransposon families from the progenitor D genome had more copies in D-subgenome, and families from the progenitor A genome had more copies in A-subgenome. Some LTR retrotransposon families that insert after polyploid formation may still distribute the majority of its copies in one of the subgenomes. The data also shows that families of 10~200 copies are abundant and they have a great influence on the Gossypium genome size; on the contrary, a small number of high copy LTR retrotransposon families have less contribution to the genome size. Kimura distance distribution indicates that high copy number family is not a recent outbreak, and there is no obvious relationship between family copy number and the period of evolution. Further analysis reveals that each LTR retrotransposon family may have their own distribution characteristics in cotton.
Project description:The evolution and domestication of cotton is of great interest from both economic and evolutionary standpoints. Although many genetic and genomic resources have been generated for cotton, the genetic underpinnings of the transition from wild to domesticated cotton remain poorly known. Here we generated an intraspecific QTL mapping population specifically targeting domesticated cotton phenotypes. We used 466 F2 individuals derived from an intraspecific cross between the wild Gossypium hirsutum var. yucatanense (TX2094) and the elite cultivar G. hirsutum cv. Acala Maxxa, in two environments, to identify 120 QTL associated with phenotypic changes under domestication. While the number of QTL recovered in each subpopulation was similar, only 22 QTL were considered coincident (i.e., shared) between the two locations, eight of which shared peak markers. Although approximately half of QTL were located in the A-subgenome, many key fiber QTL were detected in the D-subgenome, which was derived from a species with unspinnable fiber. We found that many QTL are environment-specific, with few shared between the two environments, indicating that QTL associated with G. hirsutum domestication are genomically clustered but environmentally labile. Possible candidate genes were recovered and are discussed in the context of the phenotype. We conclude that the evolutionary forces that shape intraspecific divergence and domestication in cotton are complex, and that phenotypic transformations likely involved multiple interacting and environmentally responsive factors.
Project description:BACKGROUND: Upland cotton (G. hirsutum L.) is the leading fiber crop worldwide. Genetic improvement of fiber quality and yield is facilitated by a variety of genomics tools. An integrated genetic and physical map is needed to better characterize quantitative trait loci and to allow for the positional cloning of valuable genes. However, developing integrated genomic tools for complex allotetraploid genomes, like that of cotton, is highly experimental. In this report, we describe an effective approach for developing an integrated physical framework that allows for the distinguishing between subgenomes in cotton. RESULTS: A physical map has been developed with 220 and 115 BAC contigs for homeologous chromosomes 12 and 26, respectively, covering 73.49 Mb and 34.23 Mb in physical length. Approximately one half of the 220 contigs were anchored to the At subgenome only, while 48 of the 115 contigs were allocated to the Dt subgenome only. Between the two chromosomes, 67 contigs were shared with an estimated overall physical similarity between the two chromosomal homeologs at 40.0 %. A total of 401 fiber unigenes plus 214 non-fiber unigenes were located to chromosome 12 while 207 fiber unigenes plus 183 non-fiber unigenes were allocated to chromosome 26. Anchoring was done through an overgo hybridization approach and all anchored ESTs were functionally annotated via blast analysis. CONCLUSION: This integrated genomic map describes the first pair of homoeologous chromosomes of an allotetraploid genome in which BAC contigs were identified and partially separated through the use of chromosome-specific probes and locus-specific genetic markers. The approach used in this study should prove useful in the construction of genome-wide physical maps for polyploid plant genomes including Upland cotton. The identification of Gene-rich islands in the integrated map provides a platform for positional cloning of important genes and the targeted sequencing of specific genomic regions.
Project description:Polyploids account for approximately 70% of flowering plants, including many field, horticulture and forage crops. Cottons are a world-leading fiber and important oilseed crop, and a model species for study of plant polyploidization, cellulose biosynthesis and cell wall biogenesis. This study has addressed the concerns of physical mapping of polyploids with BACs and/or BIBACs by constructing a physical map of the tetraploid cotton, Gossypium hirsutum L. The physical map consists of 3,450 BIBAC contigs with an N50 contig size of 863 kb, collectively spanning 2,244 Mb. We sorted the map contigs according to their origin of subgenome, showing that we assembled physical maps for the A- and D-subgenomes of the tetraploid cotton, separately. We also identified the BIBACs in the map minimal tilling path, which consists of 15,277 clones. Moreover, we have marked the physical map with nearly 10,000 BIBAC ends (BESs), making one BES in approximately 250 kb. This physical map provides a line of evidence and a strategy for physical mapping of polyploids, and a platform for advanced research of the tetraploid cotton genome, particularly fine mapping and cloning the cotton agronomic genes and QTLs, and sequencing and assembling the cotton genome using the modern next-generation sequencing technology.
Project description:Verticillium dahliae is a causative fungal pathogen and only a few genes have been identified that exhibit critical roles in disease resistance and few has shown positive effects on the resistance to Verticillium wilt in transgenic cotton. We cloned a receptor-like kinase gene (GbRLK) induced by Verticillium dahliae (VD) in the disease-resistant cotton Gossypium barbadense cv. Hai7124. Northern blotting revealed that the GbRLK was induced by VD at 96?h after inoculation. The functional GbRLK is from D subgenome since a single base deletion results in a frameshift or dysfunctional homologue in the A subgenome in tetraploid cotton. To verify the function of GbRLK, we developed the overexpression transgenic GbRLK cotton and Arabidopsis lines, and found that they all showed the higher resistance to Verticillium in the greenhouse and field trial. The results of the expression profile using transgenic and non-transgenic Arabidopsis thaliana revealed that the GbRLK regulated expressions of a series genes associated with biotic and abiotic stresses. Therefore, we propose that the increased resistance to Verticillium dahliae infection in transgnic plants could result from reduction in the damage of water loss and regulation of defense gene expression.
Project description:Although new and emerging next-generation sequencing (NGS) technologies have reduced sequencing costs significantly, much work remains to implement them for de novo sequencing of complex and highly repetitive genomes such as the tetraploid genome of Upland cotton (Gossypium hirsutum L.). Herein we report the results from implementing a novel, hybrid Sanger/454-based BAC-pool sequencing strategy using minimum tiling path (MTP) BACs from Ctg-3301 and Ctg-465, two large genomic segments in A12 and D12 homoeologous chromosomes (Ctg). To enable generation of longer contig sequences in assembly, we implemented a hybrid assembly method to process ~35x data from 454 technology and 2.8-3x data from Sanger method. Hybrid assemblies offered higher sequence coverage and better sequence assemblies. Homology studies revealed the presence of retrotransposon regions like Copia and Gypsy elements in these contigs and also helped in identifying new genomic SSRs. Unigenes were anchored to the sequences in Ctg-3301 and Ctg-465 to support the physical map. Gene density, gene structure and protein sequence information derived from protein prediction programs were used to obtain the functional annotation of these genes. Comparative analysis of both contigs with Arabidopsis genome exhibited synteny and microcollinearity with a conserved gene order in both genomes. This study provides insight about use of MTP-based BAC-pool sequencing approach for sequencing complex polyploid genomes with limited constraints in generating better sequence assemblies to build reference scaffold sequences. Combining the utilities of MTP-based BAC-pool sequencing with current longer and short read NGS technologies in multiplexed format would provide a new direction to cost-effectively and precisely sequence complex plant genomes.
Project description:BACKGROUND: Cotton is the world's most important natural textile fiber and a significant oilseed crop. Decoding cotton genomes will provide the ultimate reference and resource for research and utilization of the species. Integration of high-density genetic maps with genomic sequence information will largely accelerate the process of whole-genome assembly in cotton. RESULTS: In this paper, we update a high-density interspecific genetic linkage map of allotetraploid cultivated cotton. An additional 1,167 marker loci have been added to our previously published map of 2,247 loci. Three new marker types, InDel (insertion-deletion) and SNP (single nucleotide polymorphism) developed from gene information, and REMAP (retrotransposon-microsatellite amplified polymorphism), were used to increase map density. The updated map consists of 3,414 loci in 26 linkage groups covering 3,667.62 cM with an average inter-locus distance of 1.08 cM. Furthermore, genome-wide sequence analysis was finished using 3,324 informative sequence-based markers and publicly-available Gossypium DNA sequence information. A total of 413,113 EST and 195 BAC sequences were physically anchored and clustered by 3,324 sequence-based markers. Of these, 14,243 ESTs and 188 BACs from different species of Gossypium were clustered and specifically anchored to the high-density genetic map. A total of 2,748 candidate unigenes from 2,111 ESTs clusters and 63 BACs were mined for functional annotation and classification. The 337 ESTs/genes related to fiber quality traits were integrated with 132 previously reported cotton fiber quality quantitative trait loci, which demonstrated the important roles in fiber quality of these genes. Higher-level sequence conservation between different cotton species and between the A- and D-subgenomes in tetraploid cotton was found, indicating a common evolutionary origin for orthologous and paralogous loci in Gossypium. CONCLUSION: This study will serve as a valuable genomic resource for tetraploid cotton genome assembly, for cloning genes related to superior agronomic traits, and for further comparative genomic analyses in Gossypium.
Project description:Cotton (Gossypium spp.) is an important crop plant that is widely grown to produce both natural textile fibers and cottonseed oil. Cotton fibers, the economically more important product of the cotton plant, are seed trichomes derived from individual cells of the epidermal layer of the seed coat. It has been known for a long time that large numbers of genes determine the development of cotton fiber, and more recently it has been determined that these genes are distributed across At and Dt subgenomes of tetraploid AD cottons. In the present study, the organization and evolution of the fiber development genes were investigated through the construction of an integrated genetic and physical map of fiber development genes whose functions have been verified and confirmed. A total of 535 cotton fiber development genes, including 103 fiber transcription factors, 259 fiber development genes, and 173 SSR-contained fiber ESTs, were analyzed at the subgenome level. A total of 499 fiber related contigs were selected and assembled. Together these contigs covered about 151 Mb in physical length, or about 6.7% of the tetraploid cotton genome. Among the 499 contigs, 397 were anchored onto individual chromosomes. Results from our studies on the distribution patterns of the fiber development genes and transcription factors between the At and Dt subgenomes showed that more transcription factors were from Dt subgenome than At, whereas more fiber development genes were from At subgenome than Dt. Combining our mapping results with previous reports that more fiber QTLs were mapped in Dt subgenome than At subgenome, the results suggested a new functional hypothesis for tetraploid cotton. After the merging of the two diploid Gossypium genomes, the At subgenome has provided most of the genes for fiber development, because it continues to function similar to its fiber producing diploid A genome ancestor. On the other hand, the Dt subgenome, with its non-fiber producing D genome ancestor, provides more transcription factors that regulate the expression of the fiber genes in the At subgenome. This hypothesis would explain previously published mapping results. At the same time, this integrated map of fiber development genes would provide a framework to clone individual full-length fiber genes, to elucidate the physiological mechanisms of the fiber differentiation, elongation, and maturation, and to systematically study the functional network of these genes that interact during the process of fiber development in the tetraploid cottons.