Mining and characterization of EST-SSR markers for Zingiber officinale Roscoe with transferability to other species of Zingiberaceae.
ABSTRACT: Zingiber officinale is a model spice herb, well known for its medicinal value. It is primarily a vegetatively propagated commercial crop. However, considerable diversity in its morphology, fiber content and chemoprofiles has been reported. The present study explores the utility of EST-derived markers in studying genetic diversity in different accessions of Z. officinale and their cross transferability within the Zingiberaceae family. A total of 38,115 ESTs sequences were assembled to generate 7850 contigs and 10,762 singletons. SSRs were searched in the unigenes and 515 SSR-containing ESTs were identified with a frequency of 1 SSR per 25.21 kb of the genome. These ESTs were also annotated using BLAST2GO. Primers were designed for 349 EST-SSRs and 25 primer pairs were randomly picked for EST SSR study. Out of these, 16 primer pairs could be optimized for amplification in different accessions of Z. officinale as well as other species belonging to Zingiberaceae. GES454, GES466, GES480 and GES486 markers were found to exhibit 100% cross-transferability among different members of Zingiberaceae.
Project description:BACKGROUND: Epimedium sagittatum (Sieb. Et Zucc.) Maxim, a traditional Chinese medicinal plant species, has been used extensively as genuine medicinal materials. Certain Epimedium species are endangered due to commercial overexploition, while sustainable application studies, conservation genetics, systematics, and marker-assisted selection (MAS) of Epimedium is less-studied due to the lack of molecular markers. Here, we report a set of expressed sequence tags (ESTs) and simple sequence repeats (SSRs) identified in these ESTs for E. sagittatum. RESULTS: cDNAs of E. sagittatum are sequenced using 454 GS-FLX pyrosequencing technology. The raw reads are cleaned and assembled into a total of 76,459 consensus sequences comprising of 17,231 contigs and 59,228 singlets. About 38.5% (29,466) of the consensus sequences significantly match to the non-redundant protein database (E-value < 1e-10), 22,295 of which are further annotated using Gene Ontology (GO) terms. A total of 2,810 EST-SSRs is identified from the Epimedium EST dataset. Trinucleotide SSR is the dominant repeat type (55.2%) followed by dinucleotide (30.4%), tetranuleotide (7.3%), hexanucleotide (4.9%), and pentanucleotide (2.2%) SSR. The dominant repeat motif is AAG/CTT (23.6%) followed by AG/CT (19.3%), ACC/GGT (11.1%), AT/AT (7.5%), and AAC/GTT (5.9%). Thirty-two SSR-ESTs are randomly selected and primer pairs are synthesized for testing the transferability across 52 Epimedium species. Eighteen primer pairs (85.7%) could be successfully transferred to Epimedium species and sixteen of those show high genetic diversity with 0.35 of observed heterozygosity (Ho) and 0.65 of expected heterozygosity (He) and high number of alleles per locus (11.9). CONCLUSION: A large EST dataset with a total of 76,459 consensus sequences is generated, aiming to provide sequence information for deciphering secondary metabolism, especially for flavonoid pathway in Epimedium. A total of 2,810 EST-SSRs is identified from EST dataset and approximately 1580 EST-SSR markers are transferable. E. sagittatum EST-SSR transferability to the major Epimedium germplasm is up to 85.7%. Therefore, this EST dataset and EST-SSRs will be a powerful resource for further studies such as taxonomy, molecular breeding, genetics, genomics, and secondary metabolism in Epimedium species.
Project description:Simple sequence repeats (SSRs) or microsatellite markers derived from expressed sequence tags (ESTs) are routinely used for molecular assisted-selection breeding, comparative genomic analysis, and genetic diversity studies. In this study, we investigated 54,546 ESTs for the identification and development of SSR markers in Pogostemon cablin (Patchouli). In total, 1219 SSRs were identified from 1144 SSR-containing ESTs. Trinucleotides (80.8%) were the most abundant SSRs, followed by di- (10.8%), mono- (7.1%), and hexa-nucleotides (1.3%). The top six motifs were CCG/CGG (15.3%), AAG/CTT (15.0%), ACC/GGT (13.5%), AGG/CCT (12.4%), ATC/ATG (9.9%), and AG/CT (9.8%). On the basis of these SSR-containing ESTs, a total of 192 primer pairs were randomly designed and used for polymorphism analysis in 38 accessions collected from different geographical regions of Guangdong, China. Of the SSR markers, 45 were polymorphic and had allele variations from two to four. Furthermore, a transferability analysis of these primer pairs revealed a 10?40% cross-species transferability in 10 related species. This report is the first comprehensive study on the development and analysis of a large set of SSR markers in P. cablin. These markers have the potential to be used in quantitative trait loci mapping, genetic diversity studies, and the fingerprinting of cultivars of P. cablin.
Project description:Expressed sequence tags (ESTs) are important resource for gene discovery, gene expression and its regulation, molecular marker development, and comparative genomics. We procured 10000 ESTs and analyzed 267 EST-SSRs markers through computational approach. The average density was one SSR/10.45?kb or 6.4% frequency, wherein trinucleotide repeats (66.74%) were the most abundant followed by di- (26.10%), tetra- (4.67%), penta- (1.5%), and hexanucleotide (1.2%) repeats. Functional annotations were done and after-effect newly developed 63 EST-SSRs were used for cross transferability, genetic diversity, and bulk segregation analysis (BSA). Out of 63 EST-SSRs, 42 markers were identified owing to their expansion genetics across 20 different plants which amplified 519 alleles at 180 loci with an average of 2.88 alleles/locus and the polymorphic information content (PIC) ranged from 0.51 to 0.93 with an average of 0.83. The cross transferability ranged from 25% for wheat to 97.22% for Schlerostachya, with an average of 55.86%, and genetic relationships were established based on diversification among them. Moreover, 10 EST-SSRs were recognized as important markers between bulks of pooled DNA of sugarcane cultivars through BSA. This study highlights the employability of the markers in transferability, genetic diversity in grass species, and distinguished sugarcane bulks.
Project description:Microsatellite or simple sequence repeat (SSR) is one of the most widely distributed molecular markers that have been widely utilized to assess genetic diversity and genetic mapping for important traits in plants. However, the understanding of microsatellite characteristics in Arachis species and the currently available amount of high-quality SSR markers remain limited. In this study, we identified 16,435 genome survey sequences SSRs (GSS-SSRs) and 40,199 expressed sequence tag SSRs (EST-SSRs) in Arachis hypogaea and its wild relative species using the publicly available sequence data. The GSS-SSRs had a density of 159.9-239.8 SSRs/Mb for wild Arachis and 1,015.8 SSR/Mb for cultivated Arachis, whereas the EST-SSRs had the density of 173.5-384.4 SSR/Mb and 250.9 SSRs/Mb for wild and cultivated Arachis, respectively. The trinucleotide SSRs were predominant across Arachis species, except that the dinucleotide accounted for most in A. hypogaea GSSs. From Arachis GSS-SSR and EST-SSR sequences, we developed 2,589 novel SSR markers that showed a high polymorphism in six diverse A. hypogaea accessions. A genetic linkage map that contained 540 novel SSR loci and 105 anchor SSR loci was constructed by case of a recombinant inbred lines F6 population. A subset of 82 randomly selected SSR markers were used to screen 39 wild and 22 cultivated Arachis accessions, which revealed a high transferability of the novel SSRs across Arachis species. Our results provided informative clues to investigate microsatellite patterns across A. hypogaea and its wild relative species and potentially facilitate the germplasm evaluation and gene mapping in Arachis species.
Project description:<h4>Background</h4>Expressed Sequence Tags (ESTs) are a source of simple sequence repeats (SSRs) that can be used to develop molecular markers for genetic studies. The availability of ESTs for Quercus robur and Quercus petraea provided a unique opportunity to develop microsatellite markers to accelerate research aimed at studying adaptation of these long-lived species to their environment. As a first step toward the construction of a SSR-based linkage map of oak for quantitative trait locus (QTL) mapping, we describe the mining and survey of EST-SSRs as well as a fast and cost-effective approach (bin mapping) to assign these markers to an approximate map position. We also compared the level of polymorphism between genomic and EST-derived SSRs and address the transferability of EST-SSRs in Castanea sativa (chestnut).<h4>Results</h4>A catalogue of 103,000 Sanger ESTs was assembled into 28,024 unigenes from which 18.6% presented one or more SSR motifs. More than 42% of these SSRs corresponded to trinucleotides. Primer pairs were designed for 748 putative unigenes. Overall 37.7% (283) were found to amplify a single polymorphic locus in a reference full-sib pedigree of Quercus robur. The usefulness of these loci for establishing a genetic map was assessed using a bin mapping approach. Bin maps were constructed for the male and female parental tree for which framework linkage maps based on AFLP markers were available. The bin set consisting of 14 highly informative offspring selected based on the number and position of crossover sites. The female and male maps comprised 44 and 37 bins, with an average bin length of 16.5 cM and 20.99 cM, respectively. A total of 256 EST-SSRs were assigned to bins and their map position was further validated by linkage mapping. EST-SSRs were found to be less polymorphic than genomic SSRs, but their transferability rate to chestnut, a phylogenetically related species to oak, was higher.<h4>Conclusion</h4>We have generated a bin map for oak comprising 256 EST-SSRs. This resource constitutes a first step toward the establishment of a gene-based map for this genus that will facilitate the dissection of QTLs affecting complex traits of ecological importance.
Project description:BACKGROUND: Over recent years, a growing effort has been made to develop microsatellite markers for the genomic analysis of the common bean (Phaseolus vulgaris) to broaden the knowledge of the molecular genetic basis of this species. The availability of large sets of expressed sequence tags (ESTs) in public databases has given rise to an expedient approach for the identification of SSRs (Simple Sequence Repeats), specifically EST-derived SSRs. In the present work, a battery of new microsatellite markers was obtained from a search of the Phaseolus vulgaris EST database. The diversity, degree of transferability and polymorphism of these markers were tested. RESULTS: From 9,583 valid ESTs, 4,764 had microsatellite motifs, from which 377 were used to design primers, and 302 (80.11%) showed good amplification quality. To analyze transferability, a group of 167 SSRs were tested, and the results showed that they were 82% transferable across at least one species. The highest amplification rates were observed between the species from the Phaseolus (63.7%), Vigna (25.9%), Glycine (19.8%), Medicago (10.2%), Dipterix (6%) and Arachis (1.8%) genera. The average PIC (Polymorphism Information Content) varied from 0.53 for genomic SSRs to 0.47 for EST-SSRs, and the average number of alleles per locus was 4 and 3, respectively. Among the 315 newly tested SSRs in the BJ (BAT93 X Jalo EEP558) population, 24% (76) were polymorphic. The integration of these segregant loci into a framework map composed of 123 previously obtained SSR markers yielded a total of 199 segregant loci, of which 182 (91.5%) were mapped to 14 linkage groups, resulting in a map length of 1,157 cM. CONCLUSIONS: A total of 302 newly developed EST-SSR markers, showing good amplification quality, are available for the genetic analysis of Phaseolus vulgaris. These markers showed satisfactory rates of transferability, especially between species that have great economic and genomic values. Their diversity was comparable to genomic SSRs, and they were incorporated in the common bean reference genetic map, which constitutes an important contribution to and advance in Phaseolus vulgaris genomic research.
Project description:Coffee breeding and improvement efforts can be greatly facilitated by availability of a large repository of simple sequence repeats (SSRs) based microsatellite markers, which provides efficiency and high-resolution in genetic analyses. This study was aimed to improve SSR availability in coffee by developing new genic-/genomic-SSR markers using in-silico bioinformatics and streptavidin-biotin based enrichment approach, respectively. The expressed sequence tag (EST) based genic microsatellite markers (EST-SSRs) were developed using the publicly available dataset of 13,175 unigene ESTs, which showed a distribution of 1 SSR/3.4 kb of coffee transcriptome. Genomic SSRs, on the other hand, were developed from an SSR-enriched small-insert partial genomic library of robusta coffee. In total, 69 new SSRs (44 EST-SSRs and 25 genomic SSRs) were developed and validated as suitable genetic markers. Diversity analysis of selected coffee genotypes revealed these to be highly informative in terms of allelic diversity and PIC values, and eighteen of these markers (? 27%) could be mapped on a robusta linkage map. Notably, the markers described here also revealed a very high cross-species transferability. In addition to the validated markers, we have also designed primer pairs for 270 putative EST-SSRs, which are expected to provide another ca. 200 useful genetic markers considering the high success rate (88%) of marker conversion of similar pairs tested/validated in this study.
Project description:Expressed sequence tags (EST) are potential source for the development of genic microsatellite markers, gene discovery, comparative genomics, and other genomic studies. In the present study, 7630 ESTs were examined from NCBI for SSR identification and characterization. A total of 263 SSRs were identified with an average density of one SSR/4.2?kb (3.4% frequency). Analysis revealed that trinucleotide repeats (47.52%) were most abundant followed by tetranucleotide (19.77%), dinucleotide (19.01%), pentanucleotide (9.12%), and hexanucleotide repeats (4.56%). Functional annotation was done through homology search and gene ontology, and 35 EST-SSRs were selected. Primer pairs were designed for evaluation of cross transferability and polymorphism among 11 plants belonging to five different families. Total 402 alleles were generated at 155 loci with an average of 2.6 alleles/locus and the polymorphic information content (PIC) ranged from 0.15 to 0.92 with an average of 0.75. The cross transferability ranged from 34.84% to 98.06% in different plants, with an average of 67.86%. Thus, the validation study of annotated 35 EST-SSR markers which correspond to particular metabolic activity revealed polymorphism and evolutionary nature in different families of Angiospermic plants.
Project description:Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed sequence tags (ESTs) are a source of SSRs that can be used to develop markers to facilitate plant breeding and for more basic research across genera and higher plant orders.Leaf and meristem tissue from 'Heritage' red raspberry (Rubus idaeus) and 'Bristol' black raspberry (R. occidentalis) were utilized for RNA extraction. After conversion to cDNA and library construction, ESTs were sequenced, quality verified, assembled and scanned for SSRs. Primers flanking the SSRs were designed and a subset tested for amplification, polymorphism and transferability across species. ESTs containing SSRs were functionally annotated using the GenBank non-redundant (nr) database and further classified using the gene ontology database.To accelerate development of EST-SSRs in the genus Rubus (Rosaceae), 1149 and 2358 cDNA sequences were generated from red raspberry and black raspberry, respectively. The cDNA sequences were screened using rigorous filtering criteria which resulted in the identification of 121 and 257 SSR loci for red and black raspberry, respectively. Primers were designed from the surrounding sequences resulting in 131 and 288 primer pairs, respectively, as some sequences contained more than one SSR locus. Sequence analysis revealed that the SSR-containing genes span a diversity of functions and share more sequence identity with strawberry genes than with other Rosaceous species.This resource of Rubus-specific, gene-derived markers will facilitate the construction of linkage maps composed of transferable markers for studying and manipulating important traits in this economically important genus.
Project description:Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. The ever-increasing available DNA datasets generated by high-throughput techniques offer an inexpensive alternative for SSRs discovery. Expressed Sequence Tags (ESTs) have been widely used as SSR source for plants of economic relevance but their application to non-model species is still modest.Here, we explored the use of publicly available ESTs (GenBank at the National Center for Biotechnology Information-NCBI) for SSRs development in non-model plants, focusing on genera listed by the International Union for the Conservation of Nature (IUCN). We also search two model genera with fully annotated genomes for EST-SSRs, Arabidopsis and Oryza, and used them as controls for genome distribution analyses. Overall, we downloaded 16 031 555 sequences for 258 plant genera which were mined for SSRsand their primers with the help of QDD1. Genome distribution analyses in Oryza and Arabidopsis were done by blasting the sequences with SSR against the Oryza sativa and Arabidopsis thaliana reference genomes implemented in the Basal Local Alignment Tool (BLAST) of the NCBI website. Finally, we performed an empirical test to determine the performance of our EST-SSRs in a few individuals from four species of two eudicot genera, Trifolium and Centaurea.We explored a total of 14 498 726 EST sequences from the dbEST database (NCBI) in 257 plant genera from the IUCN Red List. We identify a very large number (17 102) of ready-to-test EST-SSRs in most plant genera (193) at no cost. Overall, dinucleotide and trinucleotide repeats were the prevalent types but the abundance of the various types of repeat differed between taxonomic groups. Control genomes revealed that trinucleotide repeats were mostly located in coding regions while dinucleotide repeats were largely associated with untranslated regions. Our results from the empirical test revealed considerable amplification success and transferability between congenerics.The present work represents the first large-scale study developing SSRs by utilizing publicly accessible EST databases in threatened plants. Here we provide a very large number of ready-to-test EST-SSR (17 102) for 193 genera. The cross-species transferability suggests that the number of possible target species would be large. Since trinucleotide repeats are abundant and mainly linked to exons they might be useful in evolutionary and conservation studies. Altogether, our study highly supports the use of EST databases as an extremely affordable and fast alternative for SSR developing in threatened plants.