Whole genome sequencing of enriched chloroplast DNA using the Illumina GAII platform.
ABSTRACT: BACKGROUND:Complete chloroplast genome sequences provide a valuable source of molecular markers for studies in molecular ecology and evolution of plants. To obtain complete genome sequences, recent studies have made use of the polymerase chain reaction to amplify overlapping fragments from conserved gene loci. However, this approach is time consuming and can be more difficult to implement where gene organisation differs among plants. An alternative approach is to first isolate chloroplasts and then use the capacity of high-throughput sequencing to obtain complete genome sequences. We report our findings from studies of the latter approach, which used a simple chloroplast isolation procedure, multiply-primed rolling circle amplification of chloroplast DNA, Illumina Genome Analyzer II sequencing, and de novo assembly of paired-end sequence reads. RESULTS:A modified rapid chloroplast isolation protocol was used to obtain plant DNA that was enriched for chloroplast DNA, but nevertheless contained nuclear and mitochondrial DNA. Multiply-primed rolling circle amplification of this mixed template produced sufficient quantities of chloroplast DNA, even when the amount of starting material was small, and improved the template quality for Illumina Genome Analyzer II (hereafter Illumina GAII) sequencing. We demonstrate, using independent samples of karaka (Corynocarpus laevigatus), that there is high fidelity in the sequence obtained from this template. Although less than 20% of our sequenced reads could be mapped to chloroplast genome, it was relatively easy to assemble complete chloroplast genome sequences from the mixture of nuclear, mitochondrial and chloroplast reads. CONCLUSIONS:We report successful whole genome sequencing of chloroplast DNA from karaka, obtained efficiently and with high fidelity.
Project description:The discovery of novel viruses has often been accomplished by using hybridization-based methods that necessitate the availability of a previously characterized virus genome probe or knowledge of the viral nucleotide sequence to construct consensus or degenerate PCR primers. In their natural replication cycle, certain viruses employ a rolling-circle mechanism to propagate their circular genomes, and multiply primed rolling-circle amplification (RCA) with phi29 DNA polymerase has recently been applied in the amplification of circular plasmid vectors used in cloning. We employed an isothermal RCA protocol that uses random hexamer primers to amplify the complete genomes of papillomaviruses without the need for prior knowledge of their DNA sequences. We optimized this RCA technique with extracted human papillomavirus type 16 (HPV-16) DNA from W12 cells, using a real-time quantitative PCR assay to determine amplification efficiency, and obtained a 2.4 x 10(4)-fold increase in HPV-16 DNA concentration. We were able to clone the complete HPV-16 genome from this multiply primed RCA product. The optimized protocol was subsequently applied to a bovine fibropapillomatous wart tissue sample. Whereas no papillomavirus DNA could be detected by restriction enzyme digestion of the original sample, multiply primed RCA enabled us to obtain a sufficient amount of papillomavirus DNA for restriction enzyme analysis, cloning, and subsequent sequencing of a novel variant of bovine papillomavirus type 1. The multiply primed RCA method allows the discovery of previously unknown papillomaviruses, and possibly also other circular DNA viruses, without a priori sequence information.
Project description:Data presents the chloroplast genome sequences of the five sunflower alloplasmic cytoplasmic male sterility (CMS) lines obtained with using the Illumina MiSeq, HiSeq and NextSeq platforms. The sunflower alloplasmic CMS lines has the same nuclear genome from line HA89, but they differ in cytoplasmic genomes, inherited from annual (PET1, PET2 - H. petiolaris, ANN2 - H. annuus) and perennial (MAX1 - H. maximilliani) species of the genus Helianthus L. The chloroplast genomes were annotated. Also presented is a dataset of variable sites such as single nucleotide polymorphism (SNP), simple sequence repeat (SSR), insertion and deletion (INDEL) in the chloroplast genome of the sequenced alloplasmic lines. The raw reads are available in FIGSHARE (https://doi.org/10.6084/m9.figshare.7520183). The complete chloroplast genome sequences for the sunflower alloplasmic lines are available in GenBank NCBI under the accessions MK341448.1-MK341452.1; the remaining data are provided with this article.
Project description:Data present the chloroplast genome sequences of seven wild perennial <i>Helianthus</i> species obtained by using the Illumina HiSeq and NextSeq platforms. Datasets not included in the primary publication  are a source for further evolutionary studies. In particular, the annotated chloroplast genomes and datasets of single nucleotide polymorphisms (SNP), simple sequence repeats (SSR), insertion and deletion polymorphisms (INDEL) for <i>H.?tuberosus, H. salicifolius, H. pauciflorus, H. microcephalis, H.?hirsutus, H.?strumosus</i>, and <i>H.?grosseserratus</i> are presented. The raw reads are available in Figshare (https://doi.org/10.6084/m9.figshare.12600155). The complete chloroplast genome sequences for the seven perennial <i>Helianthus</i> species are available on GenBank NCBI under the accessions MT302562.1 - MT302568.1; the remaining data are provided in this article.
Project description:The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis). Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly.
Project description:Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb), Aegilops tauschii (4 Gb) and Paphiopedilum henryanum (25 Gb). We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.
Project description:Using a partially purified replication complex from tobacco chloroplasts, replication origins have been localized to minimal sequences of 82 (pKN8, positions 137 683-137 764) and 243 bp (pKN3, positions 130 513-130 755) for ori A and ori B respectively. Analysis of in vitro replication products by two-dimensional agarose gel electrophoresis showed simple Y patterns for single ori sequence-containing clones, indicative of rolling circle replication. Double Y patterns were observed when a chloroplast DNA template containing both ori s (pKN9) was tested. Dpn I analysis and control assays with Escherichia coli DNA polymerase provide a clear method to distinguish between true replication and DNA repair synthesis. These controls also support the reliability of this in vitro chloroplast DNA replication system. EM analysis of in vitro replicated products showed rolling circle replication intermediates for single ori clones (ori A or ori B), whereas D loops were observed for a clone (pKN9) containing both ori s. The minimal ori regions contain sequences which are capable of forming stem-loop structures with relatively high free energy and other sequences which interact with specific protein(s) from the chloroplast replication fraction. Apparently the minimal ori sequences reported here contain all the necessary elements for support of chloroplast DNA replication in vitro.
Project description:Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL-psaI intergenic region, however, revealed a "hot-spot" of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses.
Project description:MOTIVATION:Complete organellar genome sequences (chloroplasts and mitochondria) provide valuable resources and information for studying plant molecular ecology and evolution. As high-throughput sequencing technology advances, it becomes the norm that a shotgun approach is used to obtain complete genome sequences. Therefore, to assemble organellar sequences from the whole genome, shotgun reads are inevitable. However, associated techniques are often cumbersome, time-consuming, and difficult, because true organellar DNA is difficult to separate efficiently from nuclear copies, which have been transferred to the nucleus through the course of evolution. RESULTS:We report a new, rapid procedure for plant chloroplast and mitochondrial genome sequencing and assembly using the Roche/454 GS FLX platform. Plant cells can contain multiple copies of the organellar genomes, and there is a significant correlation between the depth of sequence reads in contigs and the number of copies of the genome. Without isolating organellar DNA from the mixture of nuclear and organellar DNA for sequencing, we retrospectively extracted assembled contigs of either chloroplast or mitochondrial sequences from the whole genome shotgun data. Moreover, the contig connection graph property of Newbler (a platform-specific sequence assembler) ensures an efficient final assembly. Using this procedure, we assembled both chloroplast and mitochondrial genomes of a resurrection plant, Boea hygrometrica, with high fidelity. We also present information and a minimal sequence dataset as a reference for the assembly of other plant organellar genomes.
Project description:Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.
Project description:PREMISE OF THE STUDY:High-throughput sequencing of genomic DNA can recover complete chloroplast genome sequences, but the sequence data are usually dominated by sequences from nuclear/mitochondrial genomes. To overcome this deficiency, a simple enrichment method for chloroplast DNA from small amounts of plant tissue was tested for eight plant species including a gymnosperm and various angiosperms. METHODS:Chloroplasts were enriched using a high-salt isolation buffer without any step gradient procedures, and enriched chloroplast DNA was sequenced by multiplexed high-throughput sequencing. RESULTS:Using this simple method, significant enrichment of chloroplast DNA-derived reads was attained, allowing deep sequencing of chloroplast genomes. As an example, the chloroplast genome of the conifer Callitris sulcata was assembled, from which polymorphic microsatellite loci were isolated successfully. DISCUSSION:This chloroplast enrichment method from small amounts of plant tissue will be particularly useful for studies that use sequencers with relatively small throughput and that cannot use large amounts of tissue (e.g., for endangered species).