Development of microsatellite markers and assembly of the plastid genome in Cistanthe longiscapa (Montiaceae) based on low-coverage whole genome sequencing.
ABSTRACT: Cistanthe longiscapa is an endemic annual herb and characteristic element of the Chilean Atacama Desert. Principal threats are the destruction of its seed deposits by human activities and reduced germination rates due to the decreasing occurrence of precipitation events. To enable population genetic and phylogeographic analyses in this species we performed paired-end shotgun sequencing (2x100 bp) of genomic DNA on the Illumina HiSeq platform and identified microsatellite (SSR) loci in the resulting sequences. From 29 million quality-filtered read pairs we obtained 549,174 contigs (average length 614 bp; N50 = 904). Searching for SSRs revealed 10,336 loci with microsatellite motifs. Initially, we designed primers for 96 loci, which were tested for PCR amplification on three C. longiscapa individuals. Successfully amplifying loci were further tested on eight individuals to screen for length variation in the resulting amplicons, and the alleles were exemplarily sequenced to infer the basis for the observed length variation. Finally we arrived at 26 validated SSR loci for population studies in C. longiscapa, which resulted in 146 bi-allelic SSR markers in our test sample of eight individuals. The genomic sequences were also used to assemble the plastid genome of C. longiscapa, which provides an additional set of maternally inherited genetic markers.
Project description:High-throughput sequencing has been dramatically accelerating the discovery of microsatellite markers (also known as Simple Sequence Repeats). Both 454 and Illumina reads have been used directly in microsatellite discovery and primer design (the "Seq-to-SSR" approach). However, constraints of this approach include: 1) many microsatellite-containing reads do not have sufficient flanking sequences to allow primer design, and 2) difficulties in removing microsatellite loci residing in longer, repetitive regions. In the current study, we applied the novel "Seq-Assembly-SSR" approach to overcome these constraints in Anisogramma anomala. In our approach, Illumina reads were first assembled into a draft genome, and the latter was then used in microsatellite discovery. A. anomala is an obligate biotrophic ascomycete that causes eastern filbert blight disease of commercial European hazelnut. Little is known about its population structure or diversity. Approximately 26 M 146 bp Illumina reads were generated from a paired-end library of a fungal strain from Oregon. The reads were assembled into a draft genome of 333 Mb (excluding gaps), with contig N50 of 10,384 bp and scaffold N50 of 32,987 bp. A bioinformatics pipeline identified 46,677 microsatellite motifs at 44,247 loci, including 2,430 compound loci. Primers were successfully designed for 42,923 loci (97%). After removing 2,886 loci close to assembly gaps and 676 loci in repetitive regions, a genome-wide microsatellite database of 39,361 loci was generated for the fungus. In experimental screening of 236 loci using four geographically representative strains, 228 (96.6%) were successfully amplified and 214 (90.7%) produced single PCR products. Twenty-three (9.7%) were found to be perfect polymorphic loci. A small-scale population study using 11 polymorphic loci revealed considerable gene diversity. Clustering analysis grouped isolates of this fungus into two clades in accordance with their geographic origins. Thus, the "Seq-Assembly-SSR" approach has proven to be a successful one for microsatellite discovery.
Project description:The main challenge associated with genotyping based on conventional length polymorphisms is the cross-laboratory standardization of allele sizes. This step requires the inclusion of standards and manual sizing to avoid false results. Capillary electrophoresis (CE) approaches limit the information to the length polymorphism and do not allow the determination of a complete marker sequence. As an alternative, high-throughput sequencing (HTS) offers complete information regarding marker sequences and their flanking regions. In this work, we investigated the suitability of a semi-quantitative sequencing approach for microsatellite genotyping using Illumina paired-end technology. Twelve microsatellite loci that are well established for grapevine CE typing were analysed on 96 grapevine samples from six different countries. We redesigned primers to the length of the amplicon for short sequencing (~100 bp). The primer pair was flanked with a 10 bp overhang for the introduction of barcodes on both sides of the amplicon to enable high multiplexing. The highest data peaks were determined as simple sequence repeat (SSR) alleles and compared with the CE dataset based on 12 reference samples. The comparison showed that HTS SSR genotyping can successfully replace the CE system in further experiments. We believe that, with next-generation sequencing, genotyping can be improved in terms of its speed, accuracy, and price.
Project description:A total of 57.8 Mb of publicly available rice (Oryza sativa L.) DNA sequence was searched to determine the frequency and distribution of different simple sequence repeats (SSRs) in the genome. SSR loci were categorized into two groups based on the length of the repeat motif. Class I, or hypervariable markers, consisted of SSRs > or =20 bp, and Class II, or potentially variable markers, consisted of SSRs > or =12 bp <20 bp. The occurrence of Class I SSRs in end-sequences of EcoRI- and HindIII-digested BAC clones was one SSR per 40 Kb, whereas in continuous genomic sequence (represented by 27 fully sequenced BAC and PAC clones), the frequency was one SSR every 16 kb. Class II SSRs were estimated to occur every 3.7 kb in BAC ends and every 1.9 kb in fully sequenced BAC and PAC clones. GC-rich trinucleotide repeats (TNRs) were most abundant in protein-coding portions of ESTs and in fully sequenced BACs and PACs, whereas AT-rich TNRs showed no such preference, and di- and tetranucleotide repeats were most frequently found in noncoding, intergenic regions of the rice genome. Microsatellites with poly(AT)n repeats represented the most abundant and polymorphic class of SSRs but were frequently associated with the Micropon family of miniature inverted-repeat transposable elements (MITEs) and were difficult to amplify. A set of 200 Class I SSR markers was developed and integrated into the existing microsatellite map of rice, providing immediate links between the genetic, physical, and sequence-based maps. This contribution brings the number of microsatellite markers that have been rigorously evaluated for amplification, map position, and allelic diversity in Oryza spp. to a total of 500.
Project description:<h4>Background</h4>The adzuki bean weevil, Callosobruchus chinensis L., is one of the most destructive pests of stored legume seeds such as mungbean, cowpea, and adzuki bean, which usually cause considerable loss in the quantity and quality of stored seeds during transportation and storage. However, a lack of genetic information of this pest results in a series of genetic questions remain largely unknown, including population genetic structure, kinship, biotype abundance, and so on. Co-dominant microsatellite markers offer a great resolving power to determine these events. Here, we report rapid microsatellite isolation from C. chinensis via high-throughput sequencing.<h4>Principal findings</h4>In this study, 94,560,852 quality-filtered and trimmed reads were obtained for the assembly of genome using Illumina paired-end sequencing technology. In total, the genome with total length of 497,124,785 bp, comprising 403,113 high quality contigs was generated with de novo assembly. More than 6800 SSR loci were detected and a suit of 6303 primer pair sequences were designed and 500 of them were randomly selected for validation. Of these, 196 pair of primers, i.e. 39.2%, produced reproducible amplicons that were polymorphic among 8 C. chinensis genotypes collected from different geographical regions. Twenty out of 196 polymorphic SSR markers were used to analyze the genetic diversity of 18 C. chinensis populations. The results showed the twenty SSR loci were highly polymorphic among these populations.<h4>Conclusions</h4>This study presents a first report of genome sequencing and de novo assembly for C. chinensis and demonstrates the feasibility of generating a large scale of sequence information and SSR loci isolation by Illumina paired-end sequencing. Our results provide a valuable resource for C. chinensis research. These novel markers are valuable for future genetic mapping, trait association, genetic structure and kinship among C. chinensis.
Project description:Traditional methods for developing polymorphic microsatellite loci without reference sequences are time-consuming and labor-intensive, and the polymorphisms of simple sequence repeat (SSR) loci developed from expressed sequence tag (EST) databases are generally poor. To address this issue, in this study, we developed a new software (PSSRdt) and established an effective method for directly obtaining polymorphism details of SSR loci by analyzing diverse transcriptome data. The new method includes three steps, raw data processing, PSSRdt application, and loci extraction and verification. To test the practicality of the method, we successfully obtained 1940 potential polymorphic SSRs from the transcript dataset combined with 44 pea aphid transcriptomes. Fifty-two SSR loci obtained by the new method were selected for validating the polymorphic characteristics by genotyping in pea aphid individuals. The results showed that over 92% of SSR loci were polymorphic and 73.1% of loci were highly polymorphic. Our new software and method provide an innovative approach to microsatellite development based on RNA-seq data, and open a new path for the rapid mining of numerous loci with polymorphism to add to the body of research on microsatellites.
Project description:Microsatellites are a tract of repetitive, short DNA motifs (usually 1 to 6 bp) abundant in eukaryotic genomes. They are robust molecular markers in many areas of studies. Development of microsatellite markers usually involves three steps: (1) obtaining microsatellite-containing sequences, (2) primer design, and (3) screening microsatellite loci for polymorphism. The first and third steps require considerable resources. Next generation sequencing technologies have greatly alleviated the constraint of the first step. In this study, we leveraged the availability of genome assemblies of multiple individuals in many species and designed a comparative genomics approach to bioinformatically identify polymorphic loci. Our approach can eliminate or greatly reduce the need of experimental screening for polymorphism and ensure that the flanking regions do not have length difference that would confound interpretation of genotyping results using microsatellite markers. We applied this approach on Phytophthora sojae, a soybean pathogen, and identified 157 high-quality, informative microsatellite markers in this oomycete. Experimental validation of 20 loci supported bioinformatics predictions. Our approach can be readily applied to other organisms of which the genomes of multiple individuals have been sequenced.
Project description:UNLABELLED: PREMISE OF THE STUDY:Sixteen novel, polymorphic, multiplexed microsatellite loci were developed for eastern white cedar (Thuja occidentalis) using simple sequence repeat (SSR)-enriched shotgun pyrosequencing. • METHODS AND RESULTS:Sixteen loci were tested on a panel of 24 individuals from different populations. The number of observed alleles ranged from four to 22. Four sets of multiplex PCR for the 16 loci were then carried out on 60 individuals of two populations from islands of FERLD Duparquet Forest, Canada. Mean number of alleles, observed heterozygosity, and expected heterozygosity were respectively 5.75, 0.594, and 0.574 for Island 58, and 5.50, 0.704, and 0.624 for Island 134. • CONCLUSIONS:Four sets of multiplex microsatellite loci can be used for future genetic studies, which includes investigating genetic diversity and structure, and fragmentation and regeneration studies.
Project description:Grasspea (Lathyrus sativus L., 2n = 14) has great agronomic potential because of its ability to survive under extreme conditions, such as drought and flood. However, this legume is less investigated because of its sparse genomic resources and very slow breeding process. In this study, 570 million quality-filtered and trimmed cDNA sequence reads with total length of over 82 billion bp were obtained using the Illumina NextSeqTM 500 platform. Approximately two million contigs and 142,053 transcripts were assembled from our RNA-Seq data, which resulted in 27,431 unigenes with an average length of 1,250 bp and maximum length of 48,515 bp. The unigenes were of high-quality. For example, the stay-green (SGR) gene of grasspea was aligned with the SGR gene of pea with high similarity. Among these unigenes, 3,204 EST-SSR primers were designed, 284 of which were randomly chosen for validation. Of these validated unigenes, 87 (30.6%) EST-SSR primers produced polymorphic amplicons among 43 grasspea accessions selected from different geographical locations. Meanwhile, 146,406 SNPs were screened and 50 SNP loci were randomly chosen for the kompetitive allele-specific PCR (KASP) validation. Over 80% (42) SNP loci were successfully transformed to KASP markers. Comparison of the dendrograms according to the SSR and KASP markers showed that the different marker systems are partially consistent with the dendrogram constructed in our study.
Project description:Testing systems for molecular identification of micropropagated elite aspen (Populus tremula L.) genotypes were developed on the base on microsatellite (SSR) loci. Out of 33 tested microsatellite loci, 14 were selected due to sustainable PCR amplification and substantial variability in elite clones of aspen aimed for establishment of fast-rotated forest plantations. All eight tested clones had different multilocus genotypes. Among 114 trees from three reference native stands located near the established plantations, 80 haplotypes were identified while some repeated genotypes were attributed to natural clones which appeared as a result of sprouting. The selected set of SSR markers showed reliable individual identification with low probability of appearance of identical aspen genotypes (a minimum of 4.8 · 10(-10) and 1 × 10(-4) for unrelated and related individuals, resp.). Case studies demonstrating practical applications of the test system are described including analysis of clonal structure and levels of genetic diversity in three natural aspen stands growing in the regions where plantations made of elite clones were established.
Project description:We identify a large number of microsatellites from Galium trfidum, a plant species considered rare and endangered in Central and Western Europe. Using a combination of a total enriched genomic library and small-scale 454 pyrosequencing, we determined 9755 contigs with a length of 100 to 6192 bp. Within this dataset, we identified 153 SSR motifs in 144 contigs. Here, we tested 14 microsatellite loci in 2 populations of G. trifidum. The number of alleles and expected heterozygosity were 1-8 (mean 3.2) and 0.00-0.876 (0.549 on average), respectively. The markers described in this study will be useful for evaluating genetic diversity within and between populations, and gene flow between G. trifidum populations. These markers could also be applied to investigate the biological aspects of G. trifidum, such as the population dynamics and clonal structure, and to develop effective conservation programs for the Central European populations of this species.