The Complete Chloroplast Genome of a Key Ancestor of Modern Roses, Rosa chinensis var. spontanea, and a Comparison with Congeneric Species.
ABSTRACT: Rosa chinensis var. spontanea, an endemic and endangered plant of China, is one of the key ancestors of modern roses and a source for famous traditional Chinese medicines against female diseases, such as irregular menses and dysmenorrhea. In this study, the complete chloroplast (cp) genome of R. chinensis var. spontanea was sequenced, analyzed, and compared to congeneric species. The cp genome of R. chinensis var. spontanea is a typical quadripartite circular molecule of 156,590 bp in length, including one large single copy (LSC) region of 85,910 bp and one small single copy (SSC) region of 18,762 bp, separated by two inverted repeat (IR) regions of 25,959 bp. The GC content of the whole genome is 37.2%, while that of LSC, SSC, and IR is 42.8%, 35.2% and 31.2%, respectively. The genome encodes 129 genes, including 84 protein-coding genes (PCGs), 37 transfer RNA (tRNA) genes, and eight ribosomal RNA (rRNA) genes. Seventeen genes in the IR regions were found to be duplicated. Thirty-three forward and five inverted repeats were detected in the cp genome of R. chinensis var. spontanea. The genome is rich in SSRs. In total, 85 SSRs were detected. A genome comparison revealed that IR contraction might be the reason for the relatively smaller cp genome size of R. chinensis var. spontanea compared to other congeneric species. Sequence analysis revealed that the LSC and SSC regions were more divergent than the IR regions within the genus Rosa and that a higher divergence occurred in non-coding regions than in coding regions. A phylogenetic analysis showed that the sampled species of the genus Rosa formed a monophyletic clade and that R. chinensis var. spontanea shared a more recent ancestor with R. lichiangensis of the section Synstylae than with R. odorata var. gigantea of the section Chinenses. This information will be useful for the conservation genetics of R. chinensis var. spontanea and for the phylogenetic study of the genus Rosa, and it might also facilitate the genetics and breeding of modern roses.
Project description:Chloroplast (cp) genome sequences provide a valuable source for DNA barcoding. Molecular phylogenetic studies have concentrated on DNA sequencing of conserved gene loci. However, this approach is time consuming and more difficult to implement when gene organization differs among species. Here we report the complete re-sequencing of the cp genome of Capsicum pepper (Capsicum annuum var. glabriusculum) using the Illumina platform. The total length of the cp genome is 156,817 bp with a 37.7% overall GC content. A pair of inverted repeats (IRs) of 50,284 bp were separated by a small single copy (SSC; 18,948 bp) and a large single copy (LSC; 87,446 bp). The number of cp genes in C. annuum var. glabriusculum is the same as that in other Capsicum species. Variations in the lengths of LSC; SSC and IR regions were the main contributors to the size variation in the cp genome of this species. A total of 125 simple sequence repeat (SSR) and 48 insertions or deletions variants were found by sequence alignment of Capsicum cp genome. These findings provide a foundation for further investigation of cp genome evolution in Capsicum and other higher plants.
Project description:Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.
Project description:Quercus acutissima, an important endemic and ecological plant of the Quercus genus, is widely distributed throughout China. However, there have been few studies on its chloroplast genome. In this study, the complete chloroplast (cp) genome of Q. acutissima was sequenced, analyzed, and compared to four species in the Fagaceae family. The size of the Q. acutissima chloroplast genome is 161,124 bp, including one large single copy (LSC) region of 90,423 bp and one small single copy (SSC) region of 19,068 bp, separated by two inverted repeat (IR) regions of 51,632 bp. The GC content of the whole genome is 36.08%, while those of LSC, SSC, and IR are 34.62%, 30.84%, and 42.78%, respectively. The Q. acutissima chloroplast genome encodes 136 genes, including 88 protein-coding genes, four ribosomal RNA genes, and 40 transfer RNA genes. In the repeat structure analysis, 31 forward and 22 inverted long repeats and 65 simple-sequence repeat loci were detected in the Q. acutissima cp genome. The existence of abundant simple-sequence repeat loci in the genome suggests the potential for future population genetic work. The genome comparison revealed that the LSC region is more divergent than the SSC and IR regions, and there is higher divergence in noncoding regions than in coding regions. The phylogenetic relationships of 25 species inferred that members of the Quercus genus do not form a clade and that Q. acutissima is closely related to Q. variabilis. This study identified the unique characteristics of the Q. acutissima cp genome, which will provide a theoretical basis for species identification and biological research.
Project description:This study reports the complete chloroplast (cp) DNA sequence of Eleutherococcus senticosus (GenBank: JN 637765), an endangered endemic species. The genome is 156,768 bp in length, and contains a pair of inverted repeat (IR) regions of 25,930 bp each, a large single copy (LSC) region of 86,755 bp and a small single copy (SSC) region of 18,153 bp. The structural organization, gene and intron contents, gene order, AT content, codon usage, and transcription units of the E. senticosus chloroplast genome are similar to that of typical land plant cp DNA. We aligned and analyzed the sequences of 86 coding genes, 19 introns and 113 intergenic spacers (IGS) in three different taxonomic hierarchies; Eleutherococcus vs. Panax, Eleutherococcus vs. Daucus, and Eleutherococcus vs. Nicotiana. The distribution of indels, the number of polymorphic sites and nucleotide diversity indicate that positional constraint is more important than functional constraint for the evolution of cp genome sequences in Asterids. For example, the intron sequences in the LSC region exhibited base substitution rates 5-11-times higher than that of the IR regions, while the intron sequences in the SSC region evolved 7-14-times faster than those in the IR region. Furthermore, the Ka/Ks ratio of the gene coding sequences supports a stronger evolutionary constraint in the IR region than in the LSC or SSC regions. Therefore, our data suggest that selective sweeps by base collection mechanisms more frequently eliminate polymorphisms in the IR region than in other regions. Chloroplast genome regions that have high levels of base substitutions also show higher incidences of indels. Thirty-five simple sequence repeat (SSR) loci were identified in the Eleutherococcus chloroplast genome. Of these, 27 are homopolymers, while six are di-polymers and two are tri-polymers. In addition to the SSR loci, we also identified 18 medium size repeat units ranging from 22 to 79 bp, 11 of which are distributed in the IGS or intron regions. These medium size repeats may contribute to developing a cp genome-specific gene introduction vector because the region may use for specific recombination sites.
Project description:The complete chloroplast (cp) genome of Lonicera japonica, a common ornamental and medicinal plant in North America and East Asia, was sequenced and analyzed. The length of the L. japonica cp genome is 155,078 bp, contains a pair of inverted repeat regions (IRa and IRb), of 23,774 bp each, as well as large (LSC, 88,858 bp) and small (SSC, 18,672 bp) single-copy regions. A total of 129 genes were identified in the cp genome, 16 of which were duplicated within the IR regions. Relative to other plant cp genomes, the L. japonica cp genome had a unique rearrangement between trnI-CAU and trnN-GUU. In L. japonica cpDNA, rps19, rpl2, and rpl23 move to the LSC region, from the IR region. The ycf1 pesudogene in the IR region is lost, and only one copy locates in the SSC region. Comparative cp DNA sequence analyses of L. japonica with other cp genomes reveal that the gene order, and the gene and intron contents, are slightly different. The introns in ycf2 and rps18 genes are found for the first time. Four genes (clpP, petB, petD, and rpl16) lost introns. However, its genome structure, GC content, and codon usage were similar to those of typical angiosperm cp genomes. All preferred synonymous codons were found to use codons ending with A/T. The AT-rich sequences were less abundant in the coding regions than in the non-coding ones. A phylogenetic analysis based on 71 protein-coding genes supported the idea that L. japonica is a sister of the Araliaceae species. This study identified unique characteristics of the L. japonica cp genome that contribute to our understanding of the cpDNA evolution. It offers valuable information for the phylogenetic and specific barcoding of this medicinal plant.
Project description:The herbal medicinal genus Aconitum L., belonging to the Ranunculaceae family, represents the earliest diverging lineage within the eudicots. It currently comprises of two subgenera, A. subgenus Lycoctonum and A. subg. Aconitum. The complete chloroplast (cp) genome sequences were characterized in three species: A. angustius, A. finetianum, and A. sinomontanum in subg. Lycoctonum and compared to other Aconitum species to clarify their phylogenetic relationship and provide molecular information for utilization of Aconitum species particularly in Eastern Asia. The length of the chloroplast genome sequences were 156,109 bp in A. angustius, 155,625 bp in A. finetianum and 157,215 bp in A. sinomontanum, with each species possessing 126 genes with 84 protein coding genes (PCGs). While genomic rearrangements were absent, structural variation was detected in the LSC/IR/SSC boundaries. Five pseudogenes were identified, among which ?rps19 and ?ycf1 were in the LSC/IR/SSC boundaries, ?rps16 and ?infA in the LSC region, and ?ycf15 in the IRb region. The nucleotide variability (Pi) of Aconitum was estimated to be 0.00549, with comparably higher variations in the LSC and SSC than the IR regions. Eight intergenic regions were revealed to be highly variable and a total of 58-62 simple sequence repeats (SSRs) were detected in all three species. More than 80% of SSRs were present in the LSC region. Altogether, 64.41% and 46.81% of SSRs are mononucleotides in subg. Lycoctonum and subg. Aconitum, respectively, while a higher percentage of di-, tri-, tetra-, and penta- SSRs were present in subg. Aconitum. Most species of subg. Aconitum in Eastern Asia were first used for phylogenetic analyses. The availability of the complete cp genome sequences of these species in subg. Lycoctonum will benefit future phylogenetic analyses and aid in germplasm utilization in Aconitum species.
Project description:Sesamum indicum is an important crop plant species for yielding oil. The complete chloroplast (cp) genome of S. indicum (GenBank acc no. JN637766) is 153,324 bp in length, and has a pair of inverted repeat (IR) regions consisting of 25,141 bp each. The lengths of the large single copy (LSC) and the small single copy (SSC) regions are 85,170 bp and 17,872 bp, respectively. Comparative cp DNA sequence analyses of S. indicum with other cp genomes reveal that the genome structure, gene order, gene and intron contents, AT contents, codon usage, and transcription units are similar to the typical angiosperm cp genomes. Nucleotide diversity of the IR region between Sesamum and three other cp genomes is much lower than that of the LSC and SSC regions in both the coding region and noncoding region. As a summary, the regional constraints strongly affect the sequence evolution of the cp genomes, while the functional constraints weakly affect the sequence evolution of cp genomes. Five short inversions associated with short palindromic sequences that form step-loop structures were observed in the chloroplast genome of S. indicum. Twenty-eight different simple sequence repeat loci have been detected in the chloroplast genome of S. indicum. Almost all of the SSR loci were composed of A or T, so this may also contribute to the A-T richness of the cp genome of S. indicum. Seven large repeated loci in the chloroplast genome of S. indicum were also identified and these loci are useful to developing S. indicum-specific cp genome vectors. The complete cp DNA sequences of S. indicum reported in this paper are prerequisite to modifying this important oilseed crop by cp genetic engineering techniques.
Project description:Bupleurum falcatum, which belongs to the family Apiaceae, has long been applied for curative treatments, especially as a liver tonic, in herbal medicine. The chloroplast (cp) genome has been an ideal model to perform the evolutionary and comparative studies because of its highly conserved features and simple structure. The Apiaceae family is taxonomically close to the Araliaceae family and there have been numerous complete chloroplast genome sequences reported in the Araliaceae family, while little is known about the Apiaceae family. In this study, the complete sequence of the B. falcatum chloroplast genome was obtained. The full-length of the cp genome is 155,989 nucleotides with a 37.66% overall guanine-cytosine (GC) content and shows a quadripartite structure composed of three nomenclatural regions: a large single-copy (LSC) region, a small single-copy (SSC) region, and a pair of inverted repeat (IR) regions. The genome occupancy is 85,912-bp, 17,517-bp, and 26,280-bp for LSC, SSC, and IR, respectively. B. falcatum was shown to contain 111 unique genes (78 for protein-coding, 29 for tRNAs, and four for rRNAs, respectively) on its chloroplast genome. Genic comparison found that B. falcatum has no pseudogenes and has two gene losses, accD in the LSC and ycf15 in the IRs. A total of 55 unique tandem repeat sequences were detected in the B. falcatum cp genome. This report is the first to describe the complete chloroplast genome sequence in B. falcatum and will open up further avenues of research to understand the evolutionary panorama and the chloroplast genome conformation in related plant species.
Project description:The complete chloroplast genome of Artemisia annua (Asteraceae), the primary source of artemisinin, was sequenced and analyzed. The A. annua cp genome is 150,995 bp, and harbors a pair of inverted repeat regions (IRa and IRb), of 24,850 bp each that separate large (LSC, 82,988 bp) and small (SSC, 18,267 bp) single-copy regions. Our annotation revealed that the A. annua cp genome contains 113 genes and 18 duplicated genes. The gene order in the SSC region of A. annua is inverted; this fact is consistent with the sequences of chloroplast genomes from three other Artemisia species. Fifteen (15) forward and seventeen (17) inverted repeats were detected in the genome. The existence of rich SSR loci in the genome suggests opportunities for future population genetics work on this anti-malarial medicinal plant. In A. annua cpDNA, the rps19 gene was found in the LSC region rather than the IR region, and the rps19 pseudogene was absent in the IR region. Sequence divergence analysis of five Asteraceae species indicated that the most highly divergent regions were found in the intergenic spacers, and that the differences between A. annua and A. fukudo were very slight. A phylogenetic analysis revealed a sister relationship between A. annua and A. fukudo. This study identified the unique characteristics of the A. annua cp genome. These results offer valuable information for future research on Artemisia species identification and for the selective breeding of A. annua with high pharmaceutical efficacy.
Project description:Pigeonpea (Cajanus cajan (L.) Millspaugh), a diploid (2n = 22) legume crop with a genome size of 852 Mbp, serves as an important source of human dietary protein especially in South East Asian and African regions. In this study, the draft chloroplast genomes of Cajanus cajan and Cajanus scarabaeoides (L.) Thouars were generated. Cajanus scarabaeoides is an important species of the Cajanus gene pool and has also been used for developing promising CMS system by different groups. A male sterile genotype harboring the C. scarabaeoides cytoplasm was used for sequencing the plastid genome. The cp genome of C. cajan is 152,242bp long, having a quadripartite structure with LSC of 83,455 bp and SSC of 17,871 bp separated by IRs of 25,398 bp. Similarly, the cp genome of C. scarabaeoides is 152,201bp long, having a quadripartite structure in which IRs of 25,402 bp length separates 83,423 bp of LSC and 17,854 bp of SSC. The pigeonpea cp genome contains 116 unique genes, including 30 tRNA, 4 rRNA, 78 predicted protein coding genes and 5 pseudogenes. A 50 kb inversion was observed in the LSC region of pigeonpea cp genome, consistent with other legumes. Comparison of cp genome with other legumes revealed the contraction of IR boundaries due to the absence of rps19 gene in the IR region. Chloroplast SSRs were mined and a total of 280 and 292 cpSSRs were identified in C. scarabaeoides and C. cajan respectively. RNA editing was observed at 37 sites in both C. scarabaeoides and C. cajan, with maximum occurrence in the ndh genes. The pigeonpea cp genome sequence would be beneficial in providing informative molecular markers which can be utilized for genetic diversity analysis and aid in understanding the plant systematics studies among major grain legumes.