Utility of EST-derived SSR in cultivated peanut (Arachis hypogaea L.) and Arachis wild species.
ABSTRACT: BACKGROUND: Lack of sufficient molecular markers hinders current genetic research in peanuts (Arachis hypogaea L.). It is necessary to develop more molecular markers for potential use in peanut genetic research. With the development of peanut EST projects, a vast amount of available EST sequence data has been generated. These data offered an opportunity to identify SSR in ESTs by data mining. RESULTS: In this study, we investigated 24,238 ESTs for the identification and development of SSR markers. In total, 881 SSRs were identified from 780 SSR-containing unique ESTs. On an average, one SSR was found per 7.3 kb of EST sequence with tri-nucleotide motifs (63.9%) being the most abundant followed by di- (32.7%), tetra- (1.7%), hexa- (1.0%) and penta-nucleotide (0.7%) repeat types. The top six motifs included AG/TC (27.7%), AAG/TTC (17.4%), AAT/TTA (11.9%), ACC/TGG (7.72%), ACT/TGA (7.26%) and AT/TA (6.3%). Based on the 780 SSR-containing ESTs, a total of 290 primer pairs were successfully designed and used for validation of the amplification and assessment of the polymorphism among 22 genotypes of cultivated peanuts and 16 accessions of wild species. The results showed that 251 primer pairs yielded amplification products, of which 26 and 221 primer pairs exhibited polymorphism among the cultivated and wild species examined, respectively. Two to four alleles were found in cultivated peanuts, while 3-8 alleles presented in wild species. The apparent broad polymorphism was further confirmed by cloning and sequencing of amplified alleles. Sequence analysis of selected amplified alleles revealed that allelic diversity could be attributed mainly to differences in repeat type and length in the microsatellite regions. In addition, a few single base mutations were observed in the microsatellite flanking regions. CONCLUSION: This study gives an insight into the frequency, type and distribution of peanut EST-SSRs and demonstrates successful development of EST-SSR markers in cultivated peanut. These EST-SSR markers could enrich the current resource of molecular markers for the peanut community and would be useful for qualitative and quantitative trait mapping, marker-assisted selection, and genetic diversity studies in cultivated peanut as well as related Arachis species. All of the 251 working primer pairs with names, motifs, repeat types, primer sequences, and alleles tested in cultivated and wild species are listed in Additional File 1.
Project description:With the aim to increase the number of functional markers in resource poor crop like cultivated peanut (Arachis hypogaea), large numbers of available expressed sequence tags (ESTs) in the public databases, were employed for the development of novel EST derived simple sequence repeat (SSR) markers. From 16424 unigenes, 2784 (16.95%) SSRs containing unigenes having 3373 SSR motifs were identified. Of these, 2027 (72.81%) sequences were annotated and 4124 gene ontology terms were assigned. Among different SSR motif-classes, tri-nucleotide repeats (33.86%) were the most abundant followed by di-nucleotide repeats (27.51%) while AG/CT (20.7%) and AAG/CTT (13.25%) were the most abundant repeat-motifs. A total of 2456 EST-SSR novel primer pairs were designed, of which 366 unigenes having relevance to various stresses and other functions, were PCR validated using a set of 11 diverse peanut genotypes. Of these, 340 (92.62%) primer pairs yielded clear and scorable PCR products and 39 (10.66%) primer pairs exhibited polymorphisms. Overall, the number of alleles per marker ranged from 1-12 with an average of 3.77 and the PIC ranged from 0.028 to 0.375 with an average of 0.325. The identified EST-SSRs not only enriched the existing molecular markers kitty, but would also facilitate the targeted research in marker-trait association for various stresses, inter-specific studies and genetic diversity analysis in peanut.
Project description:<h4>Background</h4>The construction of genetic linkage maps for cultivated peanut (Arachis hypogaea L.) has and continues to be an important research goal to facilitate quantitative trait locus (QTL) analysis and gene tagging for use in a marker-assisted selection in breeding. Even though a few maps have been developed, they were constructed using diploid or interspecific tetraploid populations. The most recently published intra-specific map was constructed from the cross of cultivated peanuts, in which only 135 simple sequence repeat (SSR) markers were sparsely populated in 22 linkage groups. The more detailed linkage map with sufficient markers is necessary to be feasible for QTL identification and marker-assisted selection. The objective of this study was to construct a genetic linkage map of cultivated peanut using simple sequence repeat (SSR) markers derived primarily from peanut genomic sequences, expressed sequence tags (ESTs), and by "data mining" sequences released in GenBank.<h4>Results</h4>Three recombinant inbred lines (RILs) populations were constructed from three crosses with one common female parental line Yueyou 13, a high yielding Spanish market type. The four parents were screened with 1044 primer pairs designed to amplify SSRs and 901 primer pairs produced clear PCR products. Of the 901 primer pairs, 146, 124 and 64 primer pairs (markers) were polymorphic in these populations, respectively, and used in genotyping these RIL populations. Individual linkage maps were constructed from each of the three populations and a composite map based on 93 common loci were created using JoinMap. The composite linkage maps consist of 22 composite linkage groups (LG) with 175 SSR markers (including 47 SSRs on the published AA genome maps), representing the 20 chromosomes of A. hypogaea. The total composite map length is 885.4 cM, with an average marker density of 5.8 cM. Segregation distortion in the 3 populations was 23.0%, 13.5% and 7.8% of the markers, respectively. These distorted loci tended to cluster on LG1, LG3, LG4 and LG5. There were only 15 EST-SSR markers mapped due to low polymorphism. By comparison, there were potential synteny, collinear order of some markers and conservation of collinear linkage groups among the maps and with the AA genome but not fully conservative.<h4>Conclusion</h4>A composite linkage map was constructed from three individual mapping populations with 175 SSR markers in 22 composite linkage groups. This composite genetic linkage map is among the first "true" tetraploid peanut maps produced. This map also consists of 47 SSRs that have been used in the published AA genome maps, and could be used in comparative mapping studies. The primers described in this study are PCR-based markers, which are easy to share for genetic mapping in peanuts. All 1044 primer pairs are provided as additional files and the three RIL populations will be made available to public upon request for quantitative trait loci (QTL) analysis and linkage map improvement.
Project description:Peanut is vulnerable to a range of foliar diseases such as spotted wilt caused by Tomato spotted wilt virus (TSWV), early (Cercospora arachidicola) and late (Cercosporidium personatum) leaf spots, southern stem rot (Sclerotium rolfsii), and sclerotinia blight (Sclerotinia minor). In this study, we report the generation of 17,376 peanut expressed sequence tags (ESTs) from leaf tissues of a peanut cultivar (Tifrunner, resistant to TSWV and leaf spots) and a breeding line (GT-C20, susceptible to TSWV and leaf spots). After trimming vector and discarding low quality sequences, a total of 14,432 high-quality ESTs were selected for further analysis and deposition to GenBank. Sequence clustering resulted in 6,888 unique ESTs composed of 1,703 tentative consensus (TCs) sequences and 5185 singletons. A large number of ESTs (5717) representing genes of unknown functions were also identified. Among the unique sequences, there were 856 EST-SSRs identified. A total of 290 new EST-based SSR markers were developed and examined for amplification and polymorphism in cultivated peanut and wild species. Resequencing information of selected amplified alleles revealed that allelic diversity could be attributed mainly to differences in repeat type and length in the SSR regions. In addition, a few additional INDEL mutations and substitutions were observed in the regions flanking the microsatellite regions. In addition, some defense-related transcripts were also identified, such as putative oxalate oxidase (EU024476) and NBS-LRR domains. EST data in this study have provided a new source of information for gene discovery and development of SSR markers in cultivated peanut. A total of 16931 ESTs have been deposited to the NCBI GenBank database with accession numbers ES751523 to ES768453.
Project description:BACKGROUND: Currently there exists a limited availability of genetic marker resources in sweetpotato (Ipomoea batatas), which is hindering genetic research in this species. It is necessary to develop more molecular markers for potential use in sweetpotato genetic research. With the newly developed next generation sequencing technology, large amount of transcribed sequences of sweetpotato have been generated and are available for identifying SSR markers by data mining. RESULTS: In this study, we investigated 181,615 ESTs for the identification and development of SSR markers. In total, 8,294 SSRs were identified from 7,163 SSR-containing unique ESTs. On an average, one SSR was found per 7.1 kb of EST sequence with tri-nucleotide motifs (42.9%) being the most abundant followed by di- (41.2%), tetra- (9.2%), penta- (3.7%) and hexa-nucleotide (3.1%) repeat types. The top five motifs included AG/CT (26.9%), AAG/CTT (13.5%), AT/TA (10.6%), CCG/CGG (5.8%) and AAT/ATT (4.5%). After removing possible duplicate of published EST-SSRs of sweetpotato, a total of non-repeat 7,958 SSR motifs were identified. Based on these SSR-containing sequences, 1,060 pairs of high-quality SSR primers were designed and used for validation of the amplification and assessment of the polymorphism between two parents of one mapping population (E Shu 3 Hao and Guang 2k-30) and eight accessions of cultivated sweetpotatoes. The results showed that 816 primer pairs could yield reproducible and strong amplification products, of which 195 (23.9%) and 342 (41.9%) primer pairs exhibited polymorphism between E Shu 3 Hao and Guang 2k-30 and among the 8 cultivated sweetpotatoes, respectively. CONCLUSION: This study gives an insight into the frequency, type and distribution of sweetpotato EST-SSRs and demonstrates successful development of EST-SSR markers in cultivated sweetpotato. These EST-SSR markers could enrich the current resource of molecular markers for the sweetpotato community and would be useful for qualitative and quantitative trait mapping, marker-assisted selection, evolution and genetic diversity studies in cultivated sweetpotato and related Ipomoea species.
Project description:BACKGROUND: The castor bean (Ricinus communis L.), a monotypic species in the spurge family (Euphorbiaceae, 2n = 20), is an important non-edible oilseed crop widely cultivated in tropical, sub-tropical and temperate countries for its high economic value. Because of the high level of ricinoleic acid (over 85%) in its seed oil, the castor bean seed derivatives are often used in aviation oil, lubricants, nylon, dyes, inks, soaps, adhesive and biodiesel. Due to lack of efficient molecular markers, little is known about the population genetic diversity and the genetic relationships among castor bean germplasm. Efficient and robust molecular markers are increasingly needed for breeding and improving varieties in castor bean. The advent of modern genomics has produced large amounts of publicly available DNA sequence data. In particular, expressed sequence tags (ESTs) provide valuable resources to develop gene-associated SSR markers. RESULTS: In total, 18,928 publicly available non-redundant castor bean EST sequences, representing approximately 17.03 Mb, were evaluated and 7732 SSR sites in 5,122 ESTs were identified by data mining. Castor bean exhibited considerably high frequency of EST-SSRs. We developed and characterized 118 polymorphic EST-SSR markers from 379 primer pairs flanking repeats by screening 24 castor bean samples collected from different countries. A total of 350 alleles were identified from 118 polymorphic SSR loci, ranging from 2-6 per locus (A) with an average of 2.97. The EST-SSR markers developed displayed moderate gene diversity (He) with an average of 0.41. Genetic relationships among 24 germplasms were investigated using the genotypes of 350 alleles, showing geographic pattern of genotypes across genetic diversity centers of castor bean. CONCLUSION: Castor bean EST sequences exhibited considerably high frequency of SSR sites, and were rich resources for developing EST-SSR markers. These EST-SSR markers would be particularly useful for both genetic mapping and population structure analysis, facilitating breeding and crop improvement of castor bean.
Project description:BACKGROUND:Genomic research of cultivated peanut has lagged behind other crop species because of the paucity of polymorphic DNA markers found in this crop. It is necessary to identify additional DNA markers for further genetic research in peanut. RESULTS:Microsatellite markers in cultivated peanut were developed using the SSR enrichment procedure. The results showed that the GA/CT repeat was the most frequently dispersed microsatellite in peanut. The primer pairs were designed for fifty-six different microsatellites, 19 of which showed a polymorphism among the genotypes studied. The average number of alleles per locus was 4.25, and up to 14 alleles were found at one locus. This suggests that microsatellite DNA markers produce a higher level of DNA polymorphism than other DNA markers in cultivated peanut. CONCLUSIONS:It is desirable to isolate and characterize more DNA markers in cultivated peanut for more productive genomic studies, such as genetic mapping, marker-assisted selection, and gene discovery. The development of microsatellite markers holds a promise for such studies.
Project description:Large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed in peanut (Arachis hypogaea L.) to obtain more informative genetic markers. A total of 10,102 potential non-redundant EST sequences, including 3,445 contigs and 6,657 singletons, were generated from cDNA libraries of the gynophore, roots, leaves and seedlings. A total of 3,187 primer pairs were designed on flanking regions of SSRs, some of which allowed one and two base mismatches. Among the 3,187 markers generated, 2,540 (80%) were trinucleotide repeats, 302 (9%) were dinucleotide repeats, and 345 (11%) were tetranucleotide repeats. Pre-polymorphic analyses of 24 Arachis accessions were performed using 10% polyacrylamide gels. A total of 1,571 EST-SSR markers showing clear polymorphisms were selected for further polymorphic analysis with a Fluoro-fragment Analyzer. The 16 Arachis accessions examined included cultivated peanut varieties as well as diploid species with the A or B genome. Altogether 1,281 (81.5%) of the 1,571 markers were polymorphic among the 16 accessions, and 366 (23.3%) were polymorphic among the 12 cultivated varieties. Diversity analysis was performed and the genotypes of all 16 Arachis accessions showed similarity coefficients ranging from 0.37 to 0.97. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11032-011-9604-8) contains supplementary material, which is available to authorized users.
Project description:White clover (Trifolium repens L.) is an allotetraploid species (2n = 4X = 32) that is widely distributed in temperate regions and cultivated as a forage legume. In this study, we developed expressed sequence tag (EST)-derived simple sequence repeat (SSR) markers, constructed linkage maps, and performed comparative mapping with other legume species. A total of 7982 ESTs that could be assembled into 5400 contigs and 2582 singletons were generated. Using the EST sequences that were obtained, 1973 primer pairs to amplify EST-derived SSR markers were designed and used for linkage analysis of 188 F(1) progenies, which were generated by a cross between two Japanese plants, '273-7' and 'T17-349,' with previously published SSR markers. An integrated linkage map was constructed by combining parental-specific maps, which consisted of 1743 SSR loci on 16 homeologous linkage groups with a total length of 2511 cM. The primer sequences of the developed EST-SSR markers and their map positions are available on http://clovergarden.jp/. Linkage disequilibrium (LD) was observed on 9 of 16 linkage groups of a parental-specific map. The genome structures were compared among white clover, red clover (T. pratense L.), Medicago truncatula, and Lotus japonicus. Macrosynteny was observed across the four legume species. Surprisingly, the comparative genome structure between white clover and M. truncatula had a higher degree of conservation than that of the two clover species.
Project description:Zingiber officinale is a model spice herb, well known for its medicinal value. It is primarily a vegetatively propagated commercial crop. However, considerable diversity in its morphology, fiber content and chemoprofiles has been reported. The present study explores the utility of EST-derived markers in studying genetic diversity in different accessions of Z. officinale and their cross transferability within the Zingiberaceae family. A total of 38,115 ESTs sequences were assembled to generate 7850 contigs and 10,762 singletons. SSRs were searched in the unigenes and 515 SSR-containing ESTs were identified with a frequency of 1 SSR per 25.21 kb of the genome. These ESTs were also annotated using BLAST2GO. Primers were designed for 349 EST-SSRs and 25 primer pairs were randomly picked for EST SSR study. Out of these, 16 primer pairs could be optimized for amplification in different accessions of Z. officinale as well as other species belonging to Zingiberaceae. GES454, GES466, GES480 and GES486 markers were found to exhibit 100% cross-transferability among different members of Zingiberaceae.
Project description:BACKGROUND: Epimedium sagittatum (Sieb. Et Zucc.) Maxim, a traditional Chinese medicinal plant species, has been used extensively as genuine medicinal materials. Certain Epimedium species are endangered due to commercial overexploition, while sustainable application studies, conservation genetics, systematics, and marker-assisted selection (MAS) of Epimedium is less-studied due to the lack of molecular markers. Here, we report a set of expressed sequence tags (ESTs) and simple sequence repeats (SSRs) identified in these ESTs for E. sagittatum. RESULTS: cDNAs of E. sagittatum are sequenced using 454 GS-FLX pyrosequencing technology. The raw reads are cleaned and assembled into a total of 76,459 consensus sequences comprising of 17,231 contigs and 59,228 singlets. About 38.5% (29,466) of the consensus sequences significantly match to the non-redundant protein database (E-value < 1e-10), 22,295 of which are further annotated using Gene Ontology (GO) terms. A total of 2,810 EST-SSRs is identified from the Epimedium EST dataset. Trinucleotide SSR is the dominant repeat type (55.2%) followed by dinucleotide (30.4%), tetranuleotide (7.3%), hexanucleotide (4.9%), and pentanucleotide (2.2%) SSR. The dominant repeat motif is AAG/CTT (23.6%) followed by AG/CT (19.3%), ACC/GGT (11.1%), AT/AT (7.5%), and AAC/GTT (5.9%). Thirty-two SSR-ESTs are randomly selected and primer pairs are synthesized for testing the transferability across 52 Epimedium species. Eighteen primer pairs (85.7%) could be successfully transferred to Epimedium species and sixteen of those show high genetic diversity with 0.35 of observed heterozygosity (Ho) and 0.65 of expected heterozygosity (He) and high number of alleles per locus (11.9). CONCLUSION: A large EST dataset with a total of 76,459 consensus sequences is generated, aiming to provide sequence information for deciphering secondary metabolism, especially for flavonoid pathway in Epimedium. A total of 2,810 EST-SSRs is identified from EST dataset and approximately 1580 EST-SSR markers are transferable. E. sagittatum EST-SSR transferability to the major Epimedium germplasm is up to 85.7%. Therefore, this EST dataset and EST-SSRs will be a powerful resource for further studies such as taxonomy, molecular breeding, genetics, genomics, and secondary metabolism in Epimedium species.