Multi-strategic RNA-seq analysis reveals a high-resolution transcriptional landscape in cotton.
ABSTRACT: Cotton is an important natural fiber crop, however, its comprehensive and high-resolution gene map is lacking. Here we integrate four complementary high-throughput techniques, including Pacbio long read Iso-seq, strand-specific RNA-seq, CAGE-seq, and PolyA-seq, to systematically explore the transcription landscape across 16 tissues or different organ types in Gossypium arboreum. We devise a computational pipeline, named IGIA, to reconstruct accurate gene structures from the integrated data. Our results reveal a dynamic and diverse transcriptional map in cotton: tissue-specific gene expression, alternative usage of TSSs and polyadenylation sites, hotspot of alternative splicing, and transcriptional read-through. These regulated events affect many genes in various aspects such as gain or loss of functional RNA motifs and protein domains, fine-tuning of DNA binding activity, and co-regulation for genes in the same complex or pathway. The methods and findings provide valuable resources for further functional genomic studies such as understanding natural SNP variations for plant community.
Project description:Cotton leaf curl disease (CLCuD), caused by cotton leaf curl viruses (CLCuVs), is among the most devastating diseases in cotton. While the widely cultivated cotton species Gossypium hirsutum is generally susceptible, the diploid species G. arboreum is a natural source for resistance against CLCuD. However, the influence of CLCuD on the G. arboreum transcriptome and the interaction of CLCuD with G. arboreum remains to be elucidated. Here we have used an RNA-Seq based study to analyze differential gene expression in G. arboreum under CLCuD infestation. G. arboreum plants were infested by graft inoculation using a CLCuD infected scion of G. hirsutum. CLCuD infested asymptomatic and symptomatic plants were analyzed with RNA-seq using an Illumina HiSeq. 2500. Data analysis revealed 1062 differentially expressed genes (DEGs) in G. arboreum. We selected 17 genes for qPCR to validate RNA-Seq data. We identified several genes involved in disease resistance and pathogen defense. Furthermore, a weighted gene co-expression network was constructed from the RNA-Seq dataset that indicated 50 hub genes, most of which are involved in transport processes and might have a role in the defense response of G. arboreum against CLCuD. This fundamental study will improve the understanding of virus-host interaction and identification of important genes involved in G. arboreum tolerance against CLCuD.
Project description:BACKGROUND:Cotton (Gossypium spp.) is the most important world-wide fiber crop but salt stress limits cotton production in coastal and other areas. Growth regulation factors (GRFs) play regulatory roles in response to salt stress, but their roles have not been studied in cotton under salt stress. RESULTS:We identified 19 GRF genes in G. raimondii, 18 in G. arboreum, 34 in G. hirsutum and 45 in G. barbadense, respectively. These GRF genes were phylogenetically analyzed leading to the recognition of seven GRF clades. GRF genes from diploid cottons (G. raimondii and G. arboreum) were largely retained in allopolyploid cotton, with subsequent gene expansion in G. barbadense relative to G. hirsutum. Most G. hirsutum GRF (GhGRF) genes are preferentially expressed in young and growing tissues. To explore their possible role in salt stress, we used qRT-PCR to study expression responses to NaCl treatment, showing that five GhGRF genes were down-regulated in leaves. RNA-seq experiments showed that seven GhGRF genes exhibited decreased expression in leaves under NaCl treatment, three of which (GhGRF3, GhGRF4, and GhGRF16) were identified by both RNA-seq and qRT-PCR. We also identified six and three GRF genes that exhibit decreased expression under salt stress in G. arboreum and G. barbadense, respectively. Consistent with its lack of leaf withering or yellowing under the salt treatment conditions, G. arboreum had better salt tolerance than G. hirsutum and G. barbadense. Our results suggest that GRF genes are involved in salt stress responses in Gossypium. CONCLUSION:In summary, we identified candidate GRF genes that were involved in salt stress responses in cotton.
Project description:Polyploidy is a common evolutionary occurrence in plants. Recently, published genomes of allotetraploid G. hirsutum and its donors G. arboreum and G. raimondii make cotton an accessible polyploid model. This study used chromatin immunoprecipitation with high-throughput sequencing (ChIP-Seq) to investigate the genome-wide distribution of H3K4me3 in G. arboreum and G. hirsutum, and explore the conservation and variation of genome structures between diploid and allotetraploid cotton. Our results showed that H3K4me3 modifications were associated with active transcription in both cottons. The H3K4me3 histone markers appeared mainly in genic regions and were enriched around the transcription start sites (TSSs) of genes. We integrated the ChIP-seq data of H3K4me3 with RNA-seq and ESTs data to refine the genic structure annotation. There were 6,773 and 12,773 new transcripts discovered in G. arboreum and G. hirsutum, respectively. Furthermore, co-expression networks were linked with histone modification and modularized in an attempt to explain differential H3K4me3 enrichment correlated with changes in gene transcription during cotton development and evolution. Taken together, we have combined epigenomic and transcriptomic datasets to systematically discover functional genes and compare them between G. arboreum and G. hirsutum, which may be beneficial for studying diploid and allotetraploid plants with large genomes and complicated evolution.
Project description:The Dof (DNA-binding one zinc finger) transcription factor family is a representative of plant-specific classes of transcription factors. In this study, we performed a genome-wide screening and characterization of the Dof gene family within two tetraploid species Gossypium barbadense, Gossypium hirsutum, and two diploid species Gossypium arboreum, Gossypium raimondii. 115, 116, 55 and 56 Dof genes were identified respectively and all of the genes contain a sequence encoding the Dof DNA-binding domain. Those genes were unevenly distributed across 13/26 chromosomes of the cotton. Genome comparison revealed that segmental duplication may have played crucial roles in the expansion of the cotton Dof gene family, and tandem duplication also played a minor role. Analysis of RNA-Seq data indicated that cotton Dof gene expression levels varied across different tissues and in response to different abiotic stress. Overall, our results could provide valuable information for better understanding the evolution of cotton Dof genes, and lays a foundation for future investigation in cotton.
Project description:Domesticated cotton species provide raw material for the majority of the world's textile industry. Two independent domestication events have been identified in allopolyploid cotton, one in Upland cotton (Gossypium hirsutum L.) and the other to Egyptian cotton (Gossypium barbadense L.). However, two diploid cotton species, Gossypium arboreum L. and Gossypium herbaceum L., have been cultivated for several millennia, but their status as independent domesticates has long been in question. Using genome resequencing data, we estimated the global abundance of various repetitive DNAs. We demonstrate that, despite negligible divergence in genome size, the two domesticated diploid cotton species contain different, but compensatory, repeat content and have thus experienced cryptic alterations in repeat abundance despite equivalence in genome size. Evidence of independent origin is bolstered by estimates of divergence times based on molecular evolutionary analysis of f7,000 orthologous genes, for which synonymous substitution rates suggest that G. arboreum and G. herbaceum last shared a common ancestor approximately 0.4-2.5 Ma. These data are incompatible with a shared domestication history during the emergence of agriculture and lead to the conclusion that G. arboreum and G. herbaceum were each domesticated independently.
Project description:Reactive oxygen species (ROS) are important molecules in the plant, which are involved in many biological processes, including fiber development and adaptation to abiotic stress in cotton. We carried out transcription analysis to determine the evolution of the ROS genes and analyzed their expression levels in various tissues of cotton plant under abiotic stress conditions. There were 515, 260, and 261 genes of ROS network that were identified in Gossypium hirsutum (AD? genome), G. arboreum (A genome), and G. raimondii (D genome), respectively. The ROS network genes were found to be distributed in all the cotton chromosomes, but with a tendency of aggregating on either the lower or upper arms of the chromosomes. Moreover, all the cotton ROS network genes were grouped into 17 families as per the phylogenetic tress analysis. A total of 243 gene pairs were orthologous in G. arboreum and G. raimondii. There were 240 gene pairs that were orthologous in G. arboreum, G. raimondii, and G. hirsutum. The synonymous substitution value (Ks) peaks of orthologous gene pairs between the At subgenome and the A progenitor genome (G. arboreum), D subgenome and D progenitor genome (G. raimondii) were 0.004 and 0.015, respectively. The Ks peaks of ROS network orthologous gene pairs between the two progenitor genomes (A and D genomes) and two subgenomes (At and Dt subgenome) were 0.045. The majority of Ka/Ks value of orthologous gene pairs between the A, D genomes and two subgenomes of TM-1 were lower than 1.0. RNA seq. analysis and RT-qPCR validation, showed that, CSD1,2,3,5,6; FSD1,2; MSD1,2; APX3,11; FRO5.6; and RBOH6 played a major role in fiber development while CSD1, APX1, APX2, MDAR1, GPX4-6-7, FER2, RBOH6, RBOH11, and FRO5 were integral for enhancing salt stress in cotton. ROS network-mediated signal pathway enhances the mechanism of fiber development and regulation of abiotic stress in Gossypium. This study will enhance the understanding of ROS network and form the basic foundation in exploring the mechanism of ROS network-involving the fiber development and regulation of abiotic stress in cotton.
Project description:Nucleotide binding site (NBS) genes encode a large family of disease resistance (R) proteins in plants. The availability of genomic data of the two diploid cotton species, Gossypium arboreum and Gossypium raimondii, and the two allotetraploid cotton species, Gossypium hirsutum (TM-1) and Gossypium barbadense allow for a more comprehensive and systematic comparative study of NBS-encoding genes to elucidate the mechanisms of cotton disease resistance.Based on the genome assembly data, 246, 365, 588 and 682 NBS-encoding genes were identified in G. arboreum, G. raimondii, G. hirsutum and G. barbadense, respectively. The distribution of NBS-encoding genes among the chromosomes was nonrandom and uneven, and was tended to form clusters. Gene structure analysis showed that G. arboreum and G. hirsutum possessed a greater proportion of CN, CNL, and N genes and a lower proportion of NL, TN and TNL genes compared to that of G. raimondii and G. barbadense, while the percentages of RN and RNL genes remained relatively unchanged. The percentage changes among them were largest for TNL genes, about 7 times. Exon statistics showed that the average exon numbers per NBS gene in G. raimondii and G. barbadense were all greater than that in G. arboretum and G. hirsutum. Phylogenetic analysis revealed that the TIR-NBS genes of G. barbadense were closely related with that of G. raimondii. Sequence similarity analysis showed that diploid cotton G. arboreum possessed a larger proportion of NBS-encoding genes similar to that of allotetraploid cotton G. hirsutum, while diploid G. raimondii possessed a larger proportion of NBS-encoding genes similar to that of allotetraploid cotton G. barbadense. The synteny analysis showed that more NBS genes in G. raimondii and G. arboreum were syntenic with that in G. barbadense and G. hirsutum, respectively.The structural architectures, amino acid sequence similarities and synteny of NBS-encoding genes between G. arboreum and G. hirsutum, and between G. raimondii and G. barbadense were the highest among comparisons between the diploid and allotetraploid genomes, indicating that G. hirsutum inherited more NBS-encoding genes from G. arboreum, while G. barbadense inherited more NBS-encoding genes from G. raimondii. This asymmetric evolution of NBS-encoding genes may help to explain why G. raimondii and G. barbadense are more resistant to Verticillium wilt, whereas G. arboreum and G. hirsutum are more susceptible to Verticillium wilt. The disease resistances of the allotetraploid cotton were related to their NBS-encoding genes especially in regard from which diploid progenitor they were derived, and the TNL genes may have a significant role in disease resistance to Verticillium wilt in G. raimondii and G. barbadense.
Project description:The protein phosphatase (PP2C) gene family, known to participate in cellular processes, is one of the momentous and conserved plant-specific gene families that regulate signal transduction in eukaryotic organisms. Recently, PP2Cs were identified in Arabidopsis and various other crop species, but analysis of PP2C in cotton is yet to be reported. In the current research, we found 87 (Gossypium arboreum), 147 (Gossypium barbadense), 181 (Gossypium hirsutum), and 99 (Gossypium raimondii) PP2C-encoding genes in total from the cotton genome. Herein, we provide a comprehensive analysis of the PP2C gene family in cotton, such as gene structure organization, gene duplications, expression profiling, chromosomal mapping, protein motif organization, and phylogenetic relationships of each species. Phylogenetic analysis further categorized PP2C genes into 12 subgroups based on conserved domain composition analysis. Moreover, we observed a strong signature of purifying selection among duplicated pairs (i.e., segmental and dispersed) of Gossypium hirsutum. We also observed the tissue-specific response of GhPP2C genes in organ and fiber development by comparing the RNA-sequence (RNA-seq) data reported on different organs. The qRT-PCR validation of 30 GhPP2C genes suggested their critical role in cotton by exposure to heat, cold, drought, and salt stress treatments. Hence, our findings provide an overview of the PP2C gene family in cotton based on various bioinformatic tools that demonstrated their critical role in organ and fiber development, and abiotic stress tolerance, thereby contributing to the genetic improvement of cotton for the resistant cultivar.
Project description:BACKGROUND:Plant Na+/H+ antiporters (NHXs) are membrane-localized proteins that maintain cellular Na+/K+ and pH homeostasis. Considerable evidence highlighted the critical roles of NHX family in plant development and salt response; however, NHXs in cotton are rarely studied. RESULTS:The comprehensive and systematic comparative study of NHXs in three Gossypium species was performed. We identified 12, 12, and 23 putative NHX proteins from G. arboreum, G. raimondii, and G. hirsutum, respectively. Phylogenetic study revealed that repeated polyploidization of Gossypium spp. contributed to the expansion of NHX family. Gene structure analysis showed that cotton NHXs contain many introns, which will lead to alternative splicing and help plants to adapt to high salt concentrations in soil. The expression changes of NHXs indicate the possible differences in the roles of distinct NHXs in salt response. GhNHX1 was proved to be located in the vacuolar system and intensively induced by salt stress in cotton. Silencing of GhNHX1 resulted in enhanced sensitivity of cotton seedlings to high salt concentrations, which suggests that GhNHX1 positively regulates cotton tolerance to salt stress. CONCLUSION:We characterized the gene structure, phylogenetic relationship, chromosomal location, and expression pattern of NHX genes from G. arboreum, G. raimondii, and G. hirsutum. Our findings indicated that the cotton NHX genes are regulated meticulously and differently at the transcription level with possible alternative splicing. The tolerance of plants to salt stress may rely on the expression level of a particular NHX, rather than the number of NHXs in the genome. This study could provide significant insights into the function of plant NHXs, as well as propose promising candidate genes for breeding salt-resistant cotton cultivars.
Project description:KEY MESSAGE:The fuzzless gene GaFzl was fine mapped to a 70-kb region containing a GIR1 gene, Cotton_A_11941, responsible for the fuzzless trait in Gossypium arboreum DPL972. Cotton fiber is the most important natural textile resource. The fuzzless mutant DPL972 (Gossypium arboreum) provides a useful germplasm resource to explore the molecular mechanism underlying fiber and fuzz initiation and development. In our previous research, the fuzzless gene in DPL972 was identified as a single dominant gene and named GaFzl. In the present study, we fine mapped this gene using F2 and BC1 populations. By combining traditional map-based cloning and next-generation sequencing, we mapped GaFzl to a 70-kb region containing seven annotated genes. RNA-Sequencing and re-sequencing analysis narrowed these candidates to two differentially expressed genes, Cotton_A_11941 and Cotton_A_11942. Sequence alignment uncovered no variation in coding or promoter regions of Cotton_A_11942 between DPL971 and DPL972, whereas two single-base mutations in the promoter region and a TTG insertion in the coding region were detected in Cotton_A_11941 in DPL972. Cotton_A_11941 encoding a homologous gene of GIR1 (GLABRA2-interacting repressor) in Arabidopsis thaliana is thus the candidate gene most likely responsible for the fuzzless trait in DPL972. Our findings should lead to a better understanding of cotton fuzz formation, thereby accelerating marker-assisted selection during cotton breeding.