Mining sequence variations in representative polyploid sugarcane germplasm accessions.
ABSTRACT: Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA.The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related species. A large number of sequence variations were discovered in the Saccharum complex, including SNPs, InDels, PAVs, and CNVs. Genome-wide SNPs were further used to illustrate sequence features of polyploid species and demonstrated the divergence of different species in the Saccharum complex. The results of this study showed that GBS was an effective NGS-based method to discover genomic sequence variations in highly polyploid and heterozygous species.
Project description:Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWA-mem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. The target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.
Project description:Sugarcane (Saccharum spp.) is a highly energy-efficient crop primarily for sugar and bio-ethanol production. Sugarcane genetics and cultivar improvement have been extremely challenging largely due to its complex genomes with high polyploidy levels. In this study, we deeply sequenced the coding regions of 307 sugarcane germplasm accessions. Nearly five million sequence variations were catalogued. The average of 98× sequence depth enabled different allele dosages of sequence variation to be differentiated in this polyploid collection. With selected high-quality genome-wide SNPs, we performed population genomic studies and environmental association analysis. Results illustrated that the ancient sugarcane hybrids, S. barberi and S. sinense, and modern sugarcane hybrids are significantly different in terms of genomic compositions, hybridization processes and their potential ancestry contributors. Linkage disequilibrium (LD) analysis showed a large extent of LD in sugarcane, with 962.4 Kbp, 2739.2 Kbp and 3573.6 Kbp for S. spontaneum, S. officinarum and modern S. hybrids respectively. Candidate selective sweep regions and genes were identified during domestication and historical selection processes of sugarcane in addition to genes associated with environmental variables at the original locations of the collection. This research provided an extensive amount of genomic resources for sugarcane community and the in-depth population genomic analyses shed light on the breeding and evolution history of sugarcane, a highly polyploid species.
Project description:BACKGROUND: The presence of homoeologous sequences and absence of a reference genome sequence make discovery and genotyping of single nucleotide polymorphisms (SNPs) more challenging in polyploid crops. RESULTS: To address this challenge, we constructed reduced representation libraries (RRLs) for two Brassica napus inbred lines and their 91 doubled haploid (DH) progenies using a modified ddRADseq technique. A bioinformatics pipeline termed RFAPtools was developed to discover and genotype SNPs and presence/absence variations (PAVs). Using this pipeline, a pseudo-reference sequence (PRF) containing 180,991 sequence tags was constructed. By aligning sequence reads to the pseudo-reference sequence, allelic SNPs as well as PAVs were identified and genotyped with RFAPtools. Two parallel linkage maps, one SNP bin map containing 8,780 SNP loci and one PAV linkage map containing 12,423 dominant loci, were constructed. By aligning marker sequences to B. rapa sequence scaffolds, whose genome is available, we assigned 44 unassembled sequence scaffolds comprising 8.15 Mb onto the B. rapa chromosomes, and also identified 14 instances of misassembly and eight instances of mis-ordering sequence scaffolds. CONCLUSIONS: These results indicate that the modified ddRADseq approach is a cost-effective and simple method to genotype tens of thousands SNPs and PAV markers in a polyploidy plant species. The results also demonstrated that RFAPtools developed in this study are powerful to mine allelic SNPs from homoeologous sequences in polyploids, therefore they are generally applicable in either diploid or polyploid species with or without a reference genome sequence.
Project description:Sugarcane is a major crop used for food and bioenergy production. Modern cultivars are hybrids derived from crosses between Saccharum officinarum and Saccharum spontaneum. Hybrid cultivars combine favorable characteristics from ancestral species and contain a genome that is highly polyploid and aneuploid, containing 100-130 chromosomes. These complex genomes represent a huge challenge for molecular studies and for the development of biotechnological tools that can facilitate sugarcane improvement. Here, we describe full-length enriched cDNA libraries for Saccharum officinarum, Saccharum spontaneum, and one hybrid genotype (SP803280) and analyze the set of open reading frames (ORFs) in their genomes (i.e., their ORFeomes). We found 38,195 (19%) sugarcane-specific transcripts that did not match transcripts from other databases. Less than 1.6% of all transcripts were ancestor-specific (i.e., not expressed in SP803280). We also found 78,008 putative new sugarcane transcripts that were absent in the largest sugarcane expressed sequence tag database (SUCEST). Functional annotation showed a high frequency of protein kinases and stress-related proteins. We also detected natural antisense transcript expression, which mapped to 94% of all plant KEGG pathways; however, each genotype showed different pathways enriched in antisense transcripts. Our data appeared to cover 53.2% (17,563 genes) and 46.8% (937 transcription factors) of all sugarcane full-length genes and transcription factors, respectively. This work represents a significant advancement in defining the sugarcane ORFeome and will be useful for protein characterization, single nucleotide polymorphism and splicing variant identification, evolutionary and comparative studies, and sugarcane genome assembly and annotation.
Project description:Sugarcane (Saccharum spp.) is highly polyploid and aneuploid. Modern cultivars are derived from hybridization between S. officinarum and S. spontaneum. This combination results in a genome exhibiting variable ploidy among different loci, a huge genome size (~10 Gb) and a high content of repetitive regions. An approach using genomic, transcriptomic, and genetic mapping can improve our knowledge of the behavior of genetics in sugarcane. The hypothetical HP600 and Centromere Protein C (CENP-C) genes from sugarcane were used to elucidate the allelic expression and genomic and genetic behaviors of this complex polyploid. The physically linked side-by-side genes HP600 and CENP-C were found in two different homeologous chromosome groups with ploidies of eight and ten. The first region (Region01) was a Sorghum bicolor ortholog region with all haplotypes of HP600 and CENP-C expressed, but HP600 exhibited an unbalanced haplotype expression. The second region (Region02) was a scrambled sugarcane sequence formed from different noncollinear genes containing partial duplications of HP600 and CENP-C (paralogs). This duplication resulted in a non-expressed HP600 pseudogene and a recombined fusion version of CENP-C and the orthologous gene Sobic.003G299500 with at least two chimeric gene haplotypes expressed. It was also determined that it occurred before Saccharum genus formation and after the separation of sorghum and sugarcane. A linkage map was constructed using markers from nonduplicated Region01 and for the duplication (Region01 and Region02). We compare the physical and linkage maps, demonstrating the possibility of mapping markers located in duplicated regions with markers in nonduplicated region. Our results contribute directly to the improvement of linkage mapping in complex polyploids and improve the integration of physical and genetic data for sugarcane breeding programs. Thus, we describe the complexity involved in sugarcane genetics and genomics and allelic dynamics, which can be useful for understanding complex polyploid genomes.
Project description:Sugarcane (Saccharum spp.) is a globally important crop for sugar and bioenergy production. Its highly polyploid, complex genome has hindered progress in understanding its molecular structure. Flow cytometric sorting and analysis has been used in other important crops with large genomes to dissect the genome into component chromosomes. Here we present for the first time a method to prepare suspensions of intact sugarcane chromosomes for flow cytometric analysis and sorting. Flow karyotypes were generated for two S. officinarum and three hybrid cultivars. Five main peaks were identified and each genotype had a distinct flow karyotype profile. The flow karyotypes of S. officinarum were sharper and with more discrete peaks than the hybrids, this difference is probably due to the double genome structure of the hybrids. Simple Sequence Repeat (SSR) markers were used to determine that at least one allelic copy of each of the 10 basic chromosomes could be found in each peak for every genotype, except R570, suggesting that the peaks may represent ancestral Saccharum sub genomes. The ability to flow sort Saccharum chromosomes will allow us to isolate and analyse chromosomes of interest and further examine the structure and evolution of the sugarcane genome.
Project description:To accelerate genetic studies in sugarcane, an Axiom Sugarcane100K single nucleotide polymorphism (SNP) array was designed and customized in this study. Target enrichment sequencing 300 sugarcane accessions selected from the world collection of sugarcane and related grass species yielded more than four million SNPs, from which a total of 31,449 single dose (SD) SNPs and 68,648 low dosage (33,277 SD and 35,371 double dose) SNPs from two datasets respectively were selected and tiled on Affymetrix Axiom SNP array. Most of selected SNPs (91.77%) were located within genic regions (12,935 genes), with an average of 7.1 SNPs/gene according to sorghum gene models. This newly developed array was used to genotype 469 sugarcane clones, including one F1 population derived from cross between Green German and IND81-146, one selfing population derived from CP80-1827, and 11 diverse sugarcane accessions as controls. Results of genotyping revealed a high polymorphic SNP rate (77.04%) among the 469 samples. Three linkage maps were constructed by using SD SNP markers, including a genetic map for Green German with 3,482 SD SNP markers spanning 3,336 cM, a map for IND81-146 with 1,513 SD SNP markers spanning 2,615 cM, and a map for CP80-1827 with 536 SD SNP markers spanning 3,651 cM. Quantitative trait loci (QTL) analysis identified a total of 18 QTLs controlling Sugarcane yellow leaf virus resistance segregating in the two mapping populations, harboring 27 disease resistant genes. This study demonstrated the successful development and utilization of a SNP array as an efficient genetic tool for high throughput genotyping in highly polyploid sugarcane. Overall design: A total of 471 DNA samples were genotyped, including 305 F1 progeny derived from cross between Green German and IND81-146, 153 selfing population derived from CP80-1827, 11 diverse sugarcane accessions as controls and two parental DNA samples from F1 population were replicated for SNP array assay.
Project description:BACKGROUND: MicroRNAs (miRNAs) are small regulatory RNAs, some of which are conserved in diverse plant genomes. Therefore, computational identification and further experimental validation of miRNAs from non-model organisms is both feasible and instrumental for addressing miRNA-based gene regulation and evolution. Sugarcane (Saccharum spp.) is an important biofuel crop with publicly available expressed sequence tag and genomic survey sequence databases, but little is known about miRNAs and their targets in this highly polyploid species. RESULTS: In this study, we have computationally identified 19 distinct sugarcane miRNA precursors, of which several are highly similar with their sorghum homologs at both nucleotide and secondary structure levels. The accumulation pattern of mature miRNAs varies in organs/tissues from the commercial sugarcane hybrid as well as in its corresponding founder species S. officinarum and S. spontaneum. Using sugarcane MIR827 as a query, we found a novel MIR827 precursor in the sorghum genome. Based on our computational tool, a total of 46 potential targets were identified for the 19 sugarcane miRNAs. Several targets for highly conserved miRNAs are transcription factors that play important roles in plant development. Conversely, target genes of lineage-specific miRNAs seem to play roles in diverse physiological processes, such as SsCBP1. SsCBP1 was experimentally confirmed to be a target for the monocot-specific miR528. Our findings support the notion that the regulation of SsCBP1 by miR528 is shared at least within graminaceous monocots, and this miRNA-based post-transcriptional regulation evolved exclusively within the monocots lineage after the divergence from eudicots. CONCLUSIONS: Using publicly available nucleotide databases, 19 sugarcane miRNA precursors and one new sorghum miRNA precursor were identified and classified into 14 families. Comparative analyses between sugarcane and sorghum suggest that these two species retain homologous miRNAs and targets in their genomes. Such conservation may help to clarify specific aspects of miRNA regulation and evolution in the polyploid sugarcane. Finally, our dataset provides a framework for future studies on sugarcane RNAi-dependent regulatory mechanisms.
Project description:In order to understand the genetic diversity and structure within and between the genera of Saccharum and Erianthus, 79 accessions from five species (S. officinarum, S. spontaneum, S. robustum, S. barberi, S. sinense), six accessions of E. arundinaceus, and 30 Saccharum spp. hybrids were analyzed using 21 pairs of fluorescence-labeled highly poloymorphic SSR primers and a capillary electrophoresis (CE) detection system. A total of 167 polymorphic SSR alleles were identified by CE with a mean value of polymorphic information content (PIC) of 0.92. Genetic diversity parameters among these 115 accessions revealed that Saccharum spp. hybrids were more diverse than those of Saccharum and Erianthus species. Based on the SSR data, the 115 accessions were classified into seven main phylogenetic groups, which corresponded to the Saccharum and Erianthus genera through phylogenetic analysis and principle component analysis (PCA). We propose that seven core SSR primer pairs, namely, SMC31CUQ, SMC336BS, SMC597CS, SMC703BS, SMC24DUQ, mSSCIR3, and mSSCIR43, may have a wide appicability in genotype identification of Saccharum species and Saccharum spp. hybrids. Thus, the information from this study contibites to manage sugarcane genetic resources.
Project description:Sugarcane (Saccharum spp.) and other members of Saccharum spp. are attractive biofuel feedstocks. One of the two World Collections of Sugarcane and Related Grasses (WCSRG) is in Miami, FL. This WCSRG has 1002 accessions, presumably with valuable alleles for biomass, other important agronomic traits, and stress resistance. However, the WCSRG has not been fully exploited by breeders due to its lack of characterization and unmanageable population. In order to optimize the use of this genetic resource, we aim to 1) genotypically evaluate all the 1002 accessions to understand its genetic diversity and population structure and 2) form a core collection, which captures most of the genetic diversity in the WCSRG. We screened 36 microsatellite markers on 1002 genotypes and recorded 209 alleles. Genetic diversity of the WCSRG ranged from 0 to 0.5 with an average of 0.304. The population structure analysis and principal coordinate analysis revealed three clusters with all S. spontaneum in one cluster, S. officinarum and S. hybrids in the second cluster and mostly non-Saccharum spp. in the third cluster. A core collection of 300 accessions was identified which captured the maximum genetic diversity of the entire WCSRG which can be further exploited for sugarcane and energy cane breeding. Sugarcane and energy cane breeders can effectively utilize this core collection for cultivar improvement. Further, the core collection can provide resources for forming an association panel to evaluate the traits of agronomic and commercial importance.