Insights from the complete chloroplast genome into the evolution of Sesamum indicum L.
ABSTRACT: Sesame (Sesamum indicum L.) is one of the oldest oilseed crops. In order to investigate the evolutionary characters according to the Sesame Genome Project, apart from sequencing its nuclear genome, we sequenced the complete chloroplast genome of S. indicum cv. Yuzhi 11 (white seeded) using Illumina and 454 sequencing. Comparisons of chloroplast genomes between S. indicum and the 18 other higher plants were then analyzed. The chloroplast genome of cv. Yuzhi 11 contains 153,338 bp and a total of 114 unique genes (KC569603). The number of chloroplast genes in sesame is the same as that in Nicotiana tabacum, Vitis vinifera and Platanus occidentalis. The variation in the length of the large single-copy (LSC) regions and inverted repeats (IR) in sesame compared to 18 other higher plant species was the main contributor to size variation in the cp genome in these species. The 77 functional chloroplast genes, except for ycf1 and ycf2, were highly conserved. The deletion of the cp ycf1 gene sequence in cp genomes may be due either to its transfer to the nuclear genome, as has occurred in sesame, or direct deletion, as has occurred in Panax ginseng and Cucumis sativus. The sesame ycf2 gene is only 5,721 bp in length and has lost about 1,179 bp. Nucleotides 1-585 of ycf2 when queried in BLAST had hits in the sesame draft genome. Five repeats (R10, R12, R13, R14 and R17) were unique to the sesame chloroplast genome. We also found that IR contraction/expansion in the cp genome alters its rate of evolution. Chloroplast genes and repeats display the signature of convergent evolution in sesame and other species. These findings provide a foundation for further investigation of cp genome evolution in Sesamum and other higher plants.
Project description:Sesamum indicum is an important crop plant species for yielding oil. The complete chloroplast (cp) genome of S. indicum (GenBank acc no. JN637766) is 153,324 bp in length, and has a pair of inverted repeat (IR) regions consisting of 25,141 bp each. The lengths of the large single copy (LSC) and the small single copy (SSC) regions are 85,170 bp and 17,872 bp, respectively. Comparative cp DNA sequence analyses of S. indicum with other cp genomes reveal that the genome structure, gene order, gene and intron contents, AT contents, codon usage, and transcription units are similar to the typical angiosperm cp genomes. Nucleotide diversity of the IR region between Sesamum and three other cp genomes is much lower than that of the LSC and SSC regions in both the coding region and noncoding region. As a summary, the regional constraints strongly affect the sequence evolution of the cp genomes, while the functional constraints weakly affect the sequence evolution of cp genomes. Five short inversions associated with short palindromic sequences that form step-loop structures were observed in the chloroplast genome of S. indicum. Twenty-eight different simple sequence repeat loci have been detected in the chloroplast genome of S. indicum. Almost all of the SSR loci were composed of A or T, so this may also contribute to the A-T richness of the cp genome of S. indicum. Seven large repeated loci in the chloroplast genome of S. indicum were also identified and these loci are useful to developing S. indicum-specific cp genome vectors. The complete cp DNA sequences of S. indicum reported in this paper are prerequisite to modifying this important oilseed crop by cp genetic engineering techniques.
Project description:Fagopyrum dibotrys, belongs to Polygonaceae family, is one of national key conserved wild plants of China with important medicinal and economic values. Here, the complete chloroplast (cp) genome sequence of F. dibotrys is reported. The cp genome size is 159,919?bp with a typical quadripartite structure and consisting of a pair of inverted repeat regions (30,738?bp) separated by large single copy region (85,134?bp) and small single copy region (13,309?bp). Sequencing analyses indicated that the cp genome encodes 131 genes, including 80 protein-coding genes, 28 tRNA genes and 4 rRNA genes. The genome structure, gene order and codon usage are typical of angiosperm cp genomes. We also identified 48 simple sequence repeats (SSR) loci, fewer of them are distributed in the protein-coding sequences compared to the noncoding regions. Comparison of F. dibotrys cp genome to other Polygonaceae cp genomes indicated the inverted repeats (IRs) and coding regions were more conserved than single copy and noncoding regions, and several variation hotspots were detected. Coding gene sequence divergence analyses indicated that five genes (ndhK, petL rpoC2, ycf1, ycf2) were subject to positive selection. Phylogenetic analysis among 42 species based on cp genomes and 50 protein-coding genes indicated a close relationship between F. dibotrys and F. tataricum. In summary, the complete cp genome sequence of F. dibotrys reported in this study will provide useful plastid genomic resources for population genetics and pave the way for resolving phylogenetic relationships of order Caryophyllales.
Project description:Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.
Project description:The Sesame Genome Working Group (SGWG) has been formed to sequence and assemble the sesame (Sesamum indicum L.) genome. The status of this project and our planned analyses are described.
Project description:Zingiber montanum (Z. montanum) and Zingiber zerumbet (Z. zerumbet) are important medicinal and ornamental herbs in the genus Zingiber and family Zingiberaceae. Chloroplast-derived markers are useful for species identification and phylogenetic studies, but further development is warranted for these two Zingiber species. In this study, we report the complete chloroplast genomes of Z. montanum and Z. zerumbet, which had lengths of 164,464 bp and 163,589 bp, respectively. These genomes had typical quadripartite structures with a large single copy (LSC, 87,856-89,161 bp), a small single copy (SSC, 15,803-15,642 bp), and a pair of inverted repeats (IRa and IRb, 29,393-30,449 bp). We identified 111 unique genes in each chloroplast genome, including 79 protein-coding genes, 28 tRNAs and 4 rRNA genes. We analyzed the molecular structures, gene information, amino acid frequencies, codon usage patterns, RNA editing sites, simple sequence repeats (SSRs) and long repeats from the two chloroplast genomes. A comparison of the Z. montanum and Z. zerumbet chloroplast genomes detected 489 single-nucleotide polymorphisms (SNPs) and 172 insertions/deletions (indels). Thirteen highly divergent regions, including ycf1, rps19, rps18-rpl20, accD-psaI, psaC-ndhE, psbA-trnK-UUU, trnfM-CAU-rps14, trnE-UUC-trnT-UGU, ccsA-ndhD, psbC-trnS-UGA, start-psbA, petA-psbJ, and rbcL-accD, were identified and might be useful for future species identification and phylogeny in the genus Zingiber. Positive selection was observed for ATP synthase (atpA and atpB), RNA polymerase (rpoA), small subunit ribosomal protein (rps3) and other protein-coding genes (accD, clpP, ycf1, and ycf2) based on the Ka/Ks ratios. Additionally, chloroplast SNP-based phylogeny analyses found that Zingiber was a monophyletic sister branch to Kaempferia and that chloroplast SNPs could be used to identify Zingiber species. The genome resources in our study provide valuable information for the identification and phylogenetic analysis of the genus Zingiber and family Zingiberaceae.
Project description:The genus Angelica (Apiaceae) comprises valuable herbal medicines. In this study, we determined the complete chloroplast (CP) genome sequence of A. polymorpha and compared it with that of Ligusticum officinale (GenBank accession no. NC039760). The CP genomes of A. polymorpha and L. officinale were 148,430 and 147,127 bp in length, respectively, with 37.6% GC content. Both CP genomes harbored 113 unique functional genes, including 79 protein-coding, four rRNA, and 30 tRNA genes. Comparative analysis of the two CP genomes revealed conserved genome structure, gene content, and gene order. However, highly variable regions, sufficient to distinguish between A. polymorpha and L. officinale, were identified in hypothetical chloroplast open reading frame1 (ycf1) and ycf2 genic regions. Nucleotide diversity (Pi) analysis indicated that ycf4?chloroplast envelope membrane protein (cemA) intergenic region was highly variable between the two species. Phylogenetic analysis revealed that A. polymorpha and L. officinale were well clustered at family Apiaceae. The ycf4-cemA intergenic region in A. polymorpha carried a 418 bp deletion compared with L. officinale. This region was used for the development of a novel indel marker, LYCE, which successfully discriminated between A. polymorpha and L. officinale accessions. Our results provide important taxonomic and phylogenetic information on herbal medicines and facilitate their authentication using the indel marker.
Project description:Actinidia arguta is the most basal species in a phylogenetically and economically important genus in the family Actinidiaceae. To better understand the molecular basis of the Actinidia arguta chloroplast (cp), we sequenced the complete cp genome from A. arguta using Illumina and PacBio RS II sequencing technologies. The cp genome from A. arguta was 157,611 bp in length and composed of a pair of 24,232 bp inverted repeats (IRs) separated by a 20,463 bp small single copy region (SSC) and an 88,684 bp large single copy region (LSC). Overall, the cp genome contained 113 unique genes. The cp genomes from A. arguta and three other Actinidia species from GenBank were subjected to a comparative analysis. Indel mutation events and high frequencies of base substitution were identified, and the accD and ycf2 genes showed a high degree of variation within Actinidia. Forty-seven simple sequence repeats (SSRs) and 155 repetitive structures were identified, further demonstrating the rapid evolution in Actinidia. The cp genome analysis and the identification of variable loci provide vital information for understanding the evolution and function of the chloroplast and for characterizing Actinidia population genetics.
Project description:BACKGROUND:Chloroplast (cp) genome information would facilitate the development and utilization of Taxodium resources. However, cp genome characteristics of Taxodium were poorly understood. RESULTS:We determined the complete cp genome sequences of T. distichum, T. mucronatum, and T. ascendens. The cp genomes are 131,947?bp to 132,613?bp in length, encode 120 genes with the same order, and lack typical inverted repeat (IR) regions. The longest small IR, a 282?bp trnQ-containing IR, were involved in the formation of isomers. Comparative analysis of the 3 cp genomes showed that 91.57% of the indels resulted in the periodic variation of tandem repeat (TR) motifs and 72.46% single nucleotide polymorphisms (SNPs) located closely to TRs, suggesting a relationship between TRs and mutational dynamics. Eleven hypervariable regions were identified as candidates for DNA barcode development. Hypothetical cp open reading frame 1(Ycf1) was the only one gene that has an indel in coding DNA sequence, and the indel is composed of a long TR. When extended to cupressophytes, ycf1 genes have undergone a universal insertion of TRs accompanied by extreme length expansion. Meanwhile, ycf1 also located in rearrangement endpoints of cupressophyte cp genomes. All these characteristics highlight the important role of repeats in the evolution of cp genomes. CONCLUSIONS:This study added new evidence for the role of repeats in the dynamics mechanism of cp genome mutation and rearrangement. Moreover, the information of TRs and hypervariable regions would provide reliable molecular resources for future research focusing on the infrageneric taxa identification, phylogenetic resolution, population structure and biodiversity for the genus Taxodium and Cupressophytes.
Project description:We report here the data of transcriptome sequencing of control and infected sesame genotypes. Sesame is an emerging oilseed crop . The destructive soil-borne fungi Macrophomina phaseolina Tassi (Goid) causes charcoal rot of sesame, leading to high (>50%) yield loss. Most of the high-yielding sesame cultivars (Sesamum indicum) of India are susceptible to charcoal rot. Wild sesame, Sesamum mulayanum shows a high degree of tolerance against many pathogens . We have earlier developed an interspecific hybrid between Indian cultivated sesame and S. mulayanum. The parents and the F6 recombinant constitute the three experimental genotypes in the present report. The seedlings were infected with M. phaseolina. The data of the infected and control (mock-inoculated) transcriptome is presented. The RNA-seq by Illumina NovaSeq 6000 technology generated 2.9?×?108 paired-end reads. We deposited the data in NCBI sequence read archive (SRA) with accession number PRJNA642699. The de novo assembly of clean reads generated 106,295 unigenes with an average length of 1,342 bp covering 1.42?×?108 nucleotides. The screening of 106,295 unigenes with MISA and SAMtools software resulted in the identification of 26,880 simple sequence repeats (SSRs), 90,181 single nucleotide polymorphisms (SNPs), and 25,063 insertion deletions (InDels). Apart from mono-base repeats, di-nucleotides repeats (42.51%) were found to be the most abundant, followed by tri-nucleotides (14.28%) among the SSRs. Subsequently, we have designed 22,494 pairs of primers based on perfect di and tri-nucleotide SSRs. Transitions (Ts, 60%) were the most abundant substitution type among the SNPs followed by transversions type (Tv, 40%), with a Ts/Tv ratio of 1.48. The development of genic-SSR markers and SNP information will pave the way for molecular marker-assisted breeding of sesame for tolerance against charcoal rot.
Project description:Boswellia sacra (Burseraceae), a keystone endemic species, is famous for the production of fragrant oleo-gum resin. However, the genetic make-up especially the genomic information about chloroplast is still unknown. Here, we described for the first time the chloroplast (cp) genome of B. sacra. The complete cp sequence revealed a circular genome of 160,543 bp size with 37.61% GC content. The cp genome is a typical quadripartite chloroplast structure with inverted repeats (IRs 26,763 bp) separated by small single copy (SSC; 18,962 bp) and large single copy (LSC; 88,055 bp) regions. De novo assembly and annotation showed the presence of 114 unique genes with 83 protein-coding regions. The phylogenetic analysis revealed that the B. sacra cp genome is closely related to the cp genome of Azadirachta indica and Citrus sinensis, while most of the syntenic differences were found in the non-coding regions. The pairwise distance among 76 shared genes of B. sacra and A. indica was highest for atpA, rpl2, rps12 and ycf1. The cp genome of B. sacra reveals a novel genome, which could be used for further studied to understand its diversity, taxonomy and phylogeny.