Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L.).
ABSTRACT: Microsatellites or simple sequence repeats (SSRs) are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW) genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR); 70,564 (23.9%) were found to be monomorphic and 224,703 (76.1%) were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3%) amplified one locus, 8 (17.8%) amplified multiple identical loci, and 13 (28.9%) did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising source to increase the number of genetic markers available for the wheat genome. The results of this study will be useful for investigating the genetic diversity and evolution among wheat and related species. At the same time, the results will facilitate comparative genomic studies and marker-assisted breeding (MAS) in plants.
Project description:The abundance and inherent potential for variations in simple sequence repeats (SSRs) or microsatellites resulted in valuable source for genetic markers in eukaryotes. We describe the organization and abundance of SSRs in fungus Fusarium graminearum (causative agent for Fusarium head blight or head scab of wheat). We identified 1705 SSRs of various nucleotide repeat motifs in the sequence database of F. graminearum. It is observed that mononucleotide repeats (62%) were most abundant followed by di- (20%) and trinucleotide repeats (14%). It is noted that tetra-, penta- and hexanucleotide repeats accounted for only 4% of SSRs. The estimated frequency of Class I SSRs (perfect repeats ≥20 nucleotides) was one SSR per 124.5 kb, whereas the frequency of Class II (perfect repeats >10 nucleotides and ≫20 nucleotides) was one SSR per 25.6 kb. The dynamics of SSRs will be a powerful tool for taxonomic, phylogenetic, genome mapping and population genetic studies as SSR based markers show high levels of allelic variation, codominant inheritance and ease of analysis.
Project description:Because of its popularity as an ornamental plant in East Asia, mei (Prunus mume Sieb. et Zucc.) has received increasing attention in genetic and genomic research with the recent shotgun sequencing of its genome. Here, we performed the genome-wide characterization of simple sequence repeats (SSRs) in the mei genome and detected a total of 188,149 SSRs occurring at a frequency of 794 SSR/Mb. Mononucleotide repeats were the most common type of SSR in genomic regions, followed by di- and tetranucleotide repeats. Most of the SSRs in coding sequences (CDS) were composed of tri- or hexanucleotide repeat motifs, but mononucleotide repeats were always the most common in intergenic regions. Genome-wide comparison of SSR patterns among the mei, strawberry (Fragaria vesca), and apple (Malus×domestica) genomes showed mei to have the highest density of SSRs, slightly higher than that of strawberry (608 SSR/Mb) and almost twice as high as that of apple (398 SSR/Mb). Mononucleotide repeats were the dominant SSR motifs in the three Rosaceae species. Using 144 SSR markers, we constructed a 670 cM-long linkage map of mei delimited into eight linkage groups (LGs), with an average marker distance of 5 cM. Seventy one scaffolds covering about 27.9% of the assembled mei genome were anchored to the genetic map, depending on which the macro-colinearity between the mei genome and Prunus T×E reference map was identified. The framework map of mei constructed provides a first step into subsequent high-resolution genetic mapping and marker-assisted selection for this ornamental species.
Project description:BACKGROUND: Earlier comparative maps between the genomes of rice (Oryza sativa L.), barley (Hordeum vulgare L.) and wheat (Triticum aestivum L.) were linkage maps based on cDNA-RFLP markers. The low number of polymorphic RFLP markers has limited the development of dense genetic maps in wheat and the number of available anchor points in comparative maps. Higher density comparative maps using PCR-based anchor markers are necessary to better estimate the conservation of colinearity among cereal genomes. The purposes of this study were to characterize the proportion of transcribed DNA sequences containing simple sequence repeats (SSR or microsatellites) by length and motif for wheat, barley and rice and to determine in-silico rice genome locations for primer sets developed for wheat and barley Expressed Sequence Tags. RESULTS: The proportions of SSR types (di-, tri-, tetra-, and penta-nucleotide repeats) and motifs varied with the length of the SSRs within and among the three species, with trinucleotide SSRs being the most frequent. Distributions of genomic microsatellites (gSSRs), EST-derived microsatellites (EST-SSRs), and transcribed regions in the contiguous sequence of rice chromosome 1 were highly correlated. More than 13,000 primer pairs were developed for use by the cereal research community as potential markers in wheat, barley and rice. CONCLUSION: Trinucleotide SSRs were the most common type in each of the species; however, the relative proportions of SSR types and motifs differed among rice, wheat, and barley. Genomic microsatellites were found to be primarily located in gene-rich regions of the rice genome. Microsatellite markers derived from the use of non-redundant EST-SSRs are an economic and efficient alternative to RFLP for comparative mapping in cereals.
Project description:The biotrophic parasitic fungus Puccinia striiformis f. sp. tritici (Pst) causes stripe rust, a devastating disease of wheat, endangering global food security. Because the Pst population is highly dynamic, it is difficult to develop wheat cultivars with durable and highly effective resistance. Simple sequence repeats (SSRs) are widely used as molecular markers in genetic studies to determine population structure in many organisms. However, only a small number of SSR markers have been developed for Pst. In this study, a total of 4,792 SSR loci were identified using the whole genome sequences of six isolates from different regions of the world, with a marker density of one SSR per 22.95 kb. The majority of the SSRs were di- and tri-nucleotide repeats. A database containing 1,113 SSR markers were established. Through in silico comparison, the previously reported SSR markers were found mainly in exons, whereas the SSR markers in the database were mostly in intergenic regions. Furthermore, 105 polymorphic SSR markers were confirmed in silico by their identical positions and nucleotide variations with INDELs identified among the six isolates. When 104 in silico polymorphic SSR markers were used to genotype 21 Pst isolates, 84 produced the target bands, and 82 of them were polymorphic and revealed the genetic relationships among the isolates. The results show that whole genome re-sequencing of multiple isolates provides an ideal resource for developing SSR markers, and the newly developed SSR markers are useful for genetic and population studies of the wheat stripe rust fungus.
Project description:Expressed sequence tags (ESTs) are important resource for gene discovery, gene expression and its regulation, molecular marker development, and comparative genomics. We procured 10000 ESTs and analyzed 267 EST-SSRs markers through computational approach. The average density was one SSR/10.45?kb or 6.4% frequency, wherein trinucleotide repeats (66.74%) were the most abundant followed by di- (26.10%), tetra- (4.67%), penta- (1.5%), and hexanucleotide (1.2%) repeats. Functional annotations were done and after-effect newly developed 63 EST-SSRs were used for cross transferability, genetic diversity, and bulk segregation analysis (BSA). Out of 63 EST-SSRs, 42 markers were identified owing to their expansion genetics across 20 different plants which amplified 519 alleles at 180 loci with an average of 2.88 alleles/locus and the polymorphic information content (PIC) ranged from 0.51 to 0.93 with an average of 0.83. The cross transferability ranged from 25% for wheat to 97.22% for Schlerostachya, with an average of 55.86%, and genetic relationships were established based on diversification among them. Moreover, 10 EST-SSRs were recognized as important markers between bulks of pooled DNA of sugarcane cultivars through BSA. This study highlights the employability of the markers in transferability, genetic diversity in grass species, and distinguished sugarcane bulks.
Project description:Simple sequence repeats (SSRs) can be derived from the complete genome sequence. These markers are important for gene mapping as well as marker-assisted selection (MAS). To develop SSRs for cotton gene mapping, we selected the complete genome sequence of Gossypium raimondii, which consisted of 4447 non-redundant scaffolds. Out of 775.2 Mb sequence examined, a total of 136,345 microsatellites were identified with a density of 5.69 kb per SSR in the G. raimondii genome leading to development of 112,177 primer pairs. The distributions of SSRs in the genome were non-random. Among the different motifs ranging from 1 to 6 bp, penta-nucleotide repeats were most abundant (30.5%), followed by tetra-nucleotide repeats (18.2%) and di-nucleotide repeats (16.9%). Among all identified 457 motif types, the most frequently occurring repeat motifs were poly-AT/TA, which accounted for 79.8% of the total di-nt SSRs, followed by AAAT/TTTA with 51.5% of the total tetra-nucleotede. Further, 18,834 microsatellites were detected from the protein-coding genes, and the frequency of gene containing SSRs was 46.0% in 40,976 genes of G. raimondii. These genome-based SSRs developed in the present study will lay the groundwork for developing large numbers of SSR markers for genetic mapping, gene discovery, genetic diversity analysis, and MAS breeding in cotton.
Project description:Coconut (Cocos nucifera L.) is an important economic crop in tropical countries. However, the lack of a complete reference genome and the limitations of usable DNA markers hinder genomic studies and the molecular breeding of coconut. Here, we present the results of simple sequence repeat (SSR) mining from a high-throughput genotyping-by-sequencing (GBS) study of a collection of 38 coconut accessions. A total of 22,748 SSRs with di-, tri-, tetra-, penta- and hexanucleotide repeats of five or more were identified, 2451 of which were defined as polymorphic loci based on locus clustering in 38 coconut accessions, and 315 loci were suitable for the development of SSR markers. One hundred loci were selected, and primer pairs for each SSR locus were designed and validated in 40 coconut accessions. The analysis of 74 polymorphic markers identified between 2 and 9 alleles per locus, with an average of 3.01 alleles. The assessment of the genetic diversity and genetic relationships among the 40 coconut varieties based on the analysis of population structure, principal coordinate analysis (PCoA), and phylogenetic tree analysis using the 74 polymorphic SSR markers revealed three main groups of coconuts in Thailand. The identified SSR loci and SSR markers developed in this study will be useful for the study of coconut diversity and molecular breeding. The SSR mining approach used in this study could be applied to other plant species with a complex genome regardless of the availability of reference genome.
Project description:In this study, we undertook a survey to analyze the distribution and frequency of microsatellites or Simple Sequence Repeats (SSRs) in Spodoptera littoralis multiple nucleopolyhedrovirus (SpliMNPV) genome (isolate AN-1956). Out of the 55 microsatellite motifs, identified in the SpliMNPV-AN1956 genome using in silico analysis (inclusive of mono-, di-, tri- and hexa-nucleotide repeats), 39 were found to be distributed within coding regions (cSSRs), whereas 16 were observed to lie within intergenic or noncoding regions. Among the 39 motifs located in coding regions, 21 were located in annotated functional genes whilst 18 were identified in unknown functional genes (hypothetical proteins). Among the identified motifs, trinucleotide (80%) repeats were found to be the most abundant followed by dinucleotide (13%), mononucleotide (5%) and hexanucleotide (2%) repeats. The 39 motifs located within coding regions were further validated in vitro by using PCR analysis, while the 21 motifs located within known functional genes (15 genes) were characterized using nucleotide sequencing. A comparison of the sequence analysis data of the 21 sequenced cSSRs with the published sequences is presented. Finally, the developed SSR markers of the 39 motifs were further mapped/localized onto the SpliMNPV-AN1956 genome. In conclusion, the SSR markers specific to SpliMNPV, developed in this study, could be a useful tool for the identification of isolates and analysis of genetic diversity and viral evolutionary status.
Project description:We have characterized the simple sequence repeat (SSR) markers of the eggplant (Solanum melongena) using a recent high quality sequence of its whole genome. We found nearly 133,000 perfect SSRs, a density of 125.5 SSRs/Mbp, and also about 178,400 imperfect SSRs. Of the perfect SSRs, 15.6% were complex, with two stretches of repeats separated by an intervening block of <100 nt. Di- and trinucleotide SSRs accounted, respectively, for 43 and 37% of the total. The SSRs were classified according to their number of repeats and overall length, and were assigned to their linkage group. We found 2,449 of the perfect SSRs in 2,086 genes, with an overall density of 18.5 SSRs/Mbp across the gene space; 3,524 imperfect SSRs were present in 2,924 genes at a density of 26.7 SSRs/Mbp. Putative functions were assigned via ontology to genes containing at least one SSR. Using this data we developed an "Eggplant Microsatellite DataBase" (EgMiDB) which permits identification of SSR markers in terms of their location on the genome, type of repeat (perfect vs. imperfect), motif type, sequence, repeat number and genomic/gene context. It also suggests forward and reverse primers. We employed an in silico PCR analysis to validate these SSR markers, using as templates two CDS sets and three assembled transcriptomes obtained from diverse eggplant accessions.
Project description:BACKGROUND: Microsatellites or simple sequence repeats (SSRs) in expressed sequence tags (ESTs) are useful resources for genome analysis because of their abundance, functionality and polymorphism. The advent of commercial second generation sequencing machines has lead to new strategies for developing EST-SSR markers, necessitating the development of bioinformatic framework that can keep pace with the increasing quality and quantity of sequence data produced. We describe an open scheme for analyzing ESTs and developing EST-SSR markers from reads collected by Sanger sequencing and pyrosequencing of sugi (Cryptomeria japonica). RESULTS: We collected 141,097 sequence reads by Sanger sequencing and 1,333,444 by pyrosequencing. After trimming contaminant and low quality sequences, 118,319 Sanger and 1,201,150 pyrosequencing reads were passed to the MIRA assembler, generating 81,284 contigs that were analysed for SSRs. 4,059 SSRs were found in 3,694 (4.54%) contigs, giving an SSR frequency lower than that in seven other plant species with gene indices (5.4-21.9%). The average GC content of the SSR-containing contigs was 41.55%, compared to 40.23% for all contigs. Tri-SSRs were the most common SSRs; the most common motif was AT, which was found in 655 (46.3%) di-SSRs, followed by the AAG motif, found in 342 (25.9%) tri-SSRs. Most (72.8%) tri-SSRs were in coding regions, but 55.6% of the di-SSRs were in non-coding regions; the AT motif was most abundant in 3' untranslated regions. Gene ontology (GO) annotations showed that six GO terms were significantly overrepresented within SSR-containing contigs. Forty-four EST-SSR markers were developed from 192 primer pairs using two pipelines: read2Marker and the newly-developed CMiB, which combines several open tools. Markers resulting from both pipelines showed no differences in PCR success rate and polymorphisms, but PCR success and polymorphism were significantly affected by the expected PCR product size and number of SSR repeats, respectively. EST-SSR markers exhibited less polymorphism than genomic SSRs. CONCLUSIONS: We have created a new open pipeline for developing EST-SSR markers and applied it in a comprehensive analysis of EST-SSRs and EST-SSR markers in C. japonica. The results will be useful in genomic analyses of conifers and other non-model species.