Large-scale isolation of microsatellites from Chinese Mitten Crab Eriocheir sinensis via a Solexa Genomic Survey.
ABSTRACT: Microsatellites are simple sequence repeats with a high degree of polymorphism in the genome; they are used as DNA markers in many molecular genetic studies. Using traditional methods such as the magnetic beads enrichment method, only a few microsatellite markers have been isolated from the Chinese mitten crab Eriocheir sinensis, as the crab genome sequence information is unavailable. Here, we have identified a large number of microsatellites from the Chinese mitten crab by taking advantage of Solexa genomic surveying. A total of 141,737 SSR (simple sequence repeats) motifs were identified via analysis of 883 Mb of the crab genomic DNA information, including mono-, di-, tri-, tetra-, penta- and hexa-nucleotide repeat motifs. The number of di-nucleotide repeat motifs was 82,979, making this the most abundant type of repeat motif (58.54%); the second most abundant were the tri-nucleotide repeats (42,657, 30.11%). Among di-nucleotide repeats, the most frequent repeats were AC motifs, accounting for 67.55% of the total number. AGG motifs were the most frequent (59.32%) of the tri-nucleotide motifs. A total of 15,125 microsatellite loci had a flanking sequence suitable for setting the primer of a polymerase chain reaction (PCR). To verify the identified SSRs, a subset of 100 primer pairs was randomly selected for PCR. Eighty two primer sets (82%) produced strong PCR products matching expected sizes, and 78% were polymorphic. In an analysis of 30 wild individuals from the Yangtze River with 20 primer sets, the number of alleles per locus ranged from 2--14 and the mean allelic richness was 7.4. No linkage disequilibrium was found between any pair of loci, indicating that the markers were independent. The Hardy-Weinberg equilibrium test showed significant deviation in four of the 20 microsatellite loci after sequential Bonferroni corrections. This method is cost- and time-effective in comparison to traditional approaches for the isolation of microsatellites.
Project description:BACKGROUND: Microsatellites, a special class of repetitive DNA sequence, have become one of the most popular genetic markers for population/conservation genetic studies. However, its application to endangered species has been impeded by high development costs, a lack of available sequences, and technical difficulties. The water deer Hydropotes inermis is the sole existing endangered species of the subfamily Capreolinae. Although population genetics studies are urgently required for conservation management, no species-specific microsatellite marker has been reported. METHODS: We adopted next-generation sequencing (NGS) to elucidate the microsatellite markers of Korean water deer and overcome these impediments on marker developments. We performed genotyping to determine the efficiency of this method as applied to population genetics. RESULTS: We obtained 98 Mbp of nucleotide information from 260,467 sequence reads. A total of 20,101 di-/tri-nucleotide repeat motifs were identified; di-repeats were 5.9-fold more common than tri-repeats. [CA](n) and [AAC](n)/[AAT](n) repeats were the most frequent di- and tri-repeats, respectively. Of the 17,206 di-repeats, 12,471 microsatellite primer pairs were derived. PCR amplification of 400 primer pairs yielded 106 amplicons and 79 polymorphic markers from 20 individual Korean water deer. Polymorphic rates of the 79 new microsatellites varied from 2 to 11 alleles per locus (H(e): 0.050-0.880; H(o): 0.000-1.000), while those of known microsatellite markers transferred from cattle to Chinese water deer ranged from 4 to 6 alleles per locus (H(e): 0.279-0.714; H(o): 0.300-0.400). CONCLUSIONS: Polymorphic microsatellite markers from Korean water deer were successfully identified using NGS without any prior sequence information and deposited into the public database. Thus, the methods described herein represent a rapid and low-cost way to investigate the population genetics of endangered/non-model species.
Project description:Canine taeniids are among the major tapeworms with remarkable medical and economic significance. Reliable diagnosis and differentiation of dog taeniids using simple and sensitive tools are of paramount importance for establishing an efficient surveillance system. Microsatellites as abundant unique tandem repeats of short DNA motifs are useful genetic markers for molecular epidemiological studies. The purpose of the present study was to find a primer pair for rapid differentiation of major tapeworms of dogs, Taenia hydatigena, T. multiceps, T. ovis and Echinococcus granulosus, by screening existing nucleotide data. All the mitochondrial genome records as well as non-coding ITS1 sequences of Taeniidae species were downloaded from Nucleotide database from NCBI. For prediction and analysis of potential loci of STR/SSR in ITS1 as well as mitochondrial regions, we used ChloroMitoSSRDB 2.0 and GMATo v1.2. software. Different tapeworm species were categorized according to different motif sequences and type and size of each microsatellite locus. Three primer sets were designed and tested for differentiating taeniid species and evaluated in a conventional PCR system. Four taeniid species were successfully differentiated using a primer pair in a simple conventional PCR system. We predicted 2-19 and 1-4 microsatellite loci in ITS1 and mitochondrial genome, respectively. In ITS1, 41 Di and 21 Tri motifs were found in the taeniids while the majority of the motifs in the mitochondrial genome were Tetra (89) and Tri (70). It is documented that the number and diversity of microsatellite loci is higher in nuclear ITS1 region than mostly coding mitochondrial genome.
Project description:BACKGROUND: Gene-based (genic) microsatellites are a useful tool for plant genetics and simple sequence repeat loci can often be found in coding regions of the genome. While EST sequencing can be used to discover genic microsatellites, direct screening of cDNA libraries for repeat motifs can save on overall sequencing costs. The objective of this research was to screen a large cDNA library from and Andean common bean genotype for six di-nucleotide and tri-nucleotide repeat motifs through a filter hybridization approach and to develop microsatellite markers from positive clones. RESULTS: Robotics were used for high-throughput colony picking and to create a high-density filter of 18,432 double spotted cDNA clones which was followed by hybridization with repeat motif containing probes based on GA, CA, AAT, CAG, CAA and ACG repeats. A total of 1203 positive clones were identified by their addresses and sequenced from 5' ends and if required from 3' ends to confirm repeat motif and length. Out of 886 high quality sequences, 497 had complete microsatellite loci that were not truncated by the sequencing reaction and of these tri-nucleotide repeats were more common than di-nucleotide repeats. Different motifs were found in different frequencies in the 5' and 3' ends of the cDNAs. In a microsatellite development program, primers were designed for 248 SSR loci which were tested on a panel of 18 common bean genotypes to determine their potential as genetic markers finding higher average polymorphism information content for di-nucleotide repeat markers (0.3544) than for tri-nucleotide repeat markers (0.1536). CONCLUSION: The present study provides a set of validated gene-based markers for common bean that are derived from G19833, an Andean landrace that is an important source of disease and abiotic stress tolerance which has been used for physical map development and as a mapping parent. Gene-based markers appear to be very efficient at separating divergent wild and cultivated accessions as well as Andean and Mesoamerican gene pools and therefore will be useful for diversity analyses and for comparative and transcript mapping in common bean.
Project description:BACKGROUND: Earlier comparative maps between the genomes of rice (Oryza sativa L.), barley (Hordeum vulgare L.) and wheat (Triticum aestivum L.) were linkage maps based on cDNA-RFLP markers. The low number of polymorphic RFLP markers has limited the development of dense genetic maps in wheat and the number of available anchor points in comparative maps. Higher density comparative maps using PCR-based anchor markers are necessary to better estimate the conservation of colinearity among cereal genomes. The purposes of this study were to characterize the proportion of transcribed DNA sequences containing simple sequence repeats (SSR or microsatellites) by length and motif for wheat, barley and rice and to determine in-silico rice genome locations for primer sets developed for wheat and barley Expressed Sequence Tags. RESULTS: The proportions of SSR types (di-, tri-, tetra-, and penta-nucleotide repeats) and motifs varied with the length of the SSRs within and among the three species, with trinucleotide SSRs being the most frequent. Distributions of genomic microsatellites (gSSRs), EST-derived microsatellites (EST-SSRs), and transcribed regions in the contiguous sequence of rice chromosome 1 were highly correlated. More than 13,000 primer pairs were developed for use by the cereal research community as potential markers in wheat, barley and rice. CONCLUSION: Trinucleotide SSRs were the most common type in each of the species; however, the relative proportions of SSR types and motifs differed among rice, wheat, and barley. Genomic microsatellites were found to be primarily located in gene-rich regions of the rice genome. Microsatellite markers derived from the use of non-redundant EST-SSRs are an economic and efficient alternative to RFLP for comparative mapping in cereals.
Project description:Pseudotaxus chienii (Taxaceae) is an endangered conifer species endemic to China. However, a lack of suitable molecular markers hinders the genomic and genetic studies on this species. Here, we characterized and developed the microsatellite markers from a newly sequenced P. chienii transcriptome. A total of 21,835 microsatellite loci were detected from 161,131 non-redundant unigene sequences, and the frequency of SSRs was 13.55%, with an average of one SSR loci per 9.18 kb. Mono-nucleotide, di-nucleotide, and tri-nucleotide were the dominant repeat types, accounting for 50.06, 13.49, and 29.39% of the total SSRs, respectively. In terms of distribution location, the coding regions (CDS) with few microsatellites and mainly consisted of tri-nucleotides. There were significant differences in the length of microsatellite among genic regions and motif types. Functional annotation showed that the unigenes containing microsatellites had a wide range of biological functions, most of which were related to basic metabolism, and a few might be involved in expression regulation of gene and response to environmental stress. In addition, 375 primer pairs were randomly selected and synthesized for the amplification and validation of microsatellite markers. Seventy-seven primer pairs were successfully amplified and 40 primer pairs were found to be polymorphic. Finally, 20 pairs of primers with high polymorphism were selected to assess the genetic diversity in four P. chienii populations. In addition, the newly developed microsatellite markers exhibited high transferability (70%) in Amentotaxus argotaenia. Our study could enable further genetic diversity analysis and functional gene mining on Taxaceae.
Project description:In this study, we undertook a survey to analyze the distribution and frequency of microsatellites or Simple Sequence Repeats (SSRs) in Spodoptera littoralis multiple nucleopolyhedrovirus (SpliMNPV) genome (isolate AN-1956). Out of the 55 microsatellite motifs, identified in the SpliMNPV-AN1956 genome using in silico analysis (inclusive of mono-, di-, tri- and hexa-nucleotide repeats), 39 were found to be distributed within coding regions (cSSRs), whereas 16 were observed to lie within intergenic or noncoding regions. Among the 39 motifs located in coding regions, 21 were located in annotated functional genes whilst 18 were identified in unknown functional genes (hypothetical proteins). Among the identified motifs, trinucleotide (80%) repeats were found to be the most abundant followed by dinucleotide (13%), mononucleotide (5%) and hexanucleotide (2%) repeats. The 39 motifs located within coding regions were further validated in vitro by using PCR analysis, while the 21 motifs located within known functional genes (15 genes) were characterized using nucleotide sequencing. A comparison of the sequence analysis data of the 21 sequenced cSSRs with the published sequences is presented. Finally, the developed SSR markers of the 39 motifs were further mapped/localized onto the SpliMNPV-AN1956 genome. In conclusion, the SSR markers specific to SpliMNPV, developed in this study, could be a useful tool for the identification of isolates and analysis of genetic diversity and viral evolutionary status.
Project description:PREMISE OF THE STUDY:Polymorphic microRNA (miRNA)-based microsatellite markers were developed to investigate the genetic diversity and population structure of Nelumbo nucifera (Nelumbonaceae). METHODS AND RESULTS:A total of 485 miRNA-based microsatellites were found from the genomic DNA sequences of N. nucifera. After several rounds of screening, 21 primer pairs flanking di-, tri-, or pentanucleotide repeats were identified that revealed high levels of genetic diversity in four populations with two to five alleles per locus. The observed and expected heterozygosity per locus ranged from 0.000 to 1.000 and from 0.000 to 0.803, respectively. CONCLUSIONS:The polymorphic microsatellite markers will be useful for studying the genetic diversity and population structure of N. nucifera.
Project description:Using transcriptome data to mine microsatellite and develop markers has growingly become prevalent. However, characterizing the possible function of microsatellite is relatively rare. In this study, we explored microsatellites in the transcriptome of the brown alga Sargassum thunbergii and characterized the frequencies, distribution, function and evolution, and developed primers to validate these microsatellites. Our results showed that Tri-nucleotide is the most abundant, followed by di- and mono-nucleotide. The length of microsatellite was significantly affected by the repeat motif size. The density of microsatellite in the CDS region is significantly lower than that in the UTR region. The annotation of the transcripts containing microsatellite showed that 573 transcripts have GO terms and can be categorized into 42 groups. Pathways enrichment showed that microsatellites were significantly overrepresented in the genes involved in pathways such as Ubiquitin mediated proteolysis, RNA degradation, Spliceosome, etc. Primers flanking 961 microsatellite loci were designed, and among the 30 pairs of primer selected randomly for availability test, 23 were proved to be efficient. These findings provided new insight into the function and evolution of microsatellite in transcriptome, and the identified microsatellite loci within the annotated gene will be useful for developing functional markers in S. thunbergii.
Project description:BACKGROUND: Microsatellites are ubiquitous in genomes of various organisms. With the realization that they play roles in developmental and physiological processes, rather than exist as 'junk' DNA, microsatellites are receiving increasing attention. Next-generation sequencing allows acquisition of large-scale microsatellite information, and is especially useful for plants without reference genome sequences. RESULTS: In this study, enriched DNA libraries of tree peony, a well-known ornamental woody shrub, were used for high-throughput microsatellite development by 454 GS-FLX Titanium pyrosequencing. We obtained 675,221 reads with an average length of 356 bp. The total size of examined sequences was 240,672,018 bp, from which 237,134 SSRs were identified. Of these sequences, 164,043 contained SSRs, with 27% featuring more than one SSR. Interestingly, a high proportion of SSRs (43%) were present in compound formation. SSRs with repeat motifs of 1-4 bp (mono-, di-, tri-, and tetra-nucleotide repeats) accounted for 99.8% of SSRs. Di-nucleotide repeats were the most abundant. As in most plants, the predominant motif in tree peony was (A/T)n, with (G/C)n less common. The lengths of SSRs were classified into 11 groups. The shortest SSRs (10 bp) represented 1% of the total number, whereas SSRs 21-30 and 101-110 bp long accounted for 26% and 29%, respectively, of all SSRs. Many sequences (42,111) were mapped to CDS (coding domain sequence) regions using Arabidopsis as a reference. GO annotation analysis predicted that CDSs with SSRs performed various functions associated with cellular components, molecular functions, and biological processes. Of 100 validated primer pairs, 24 were selected for polymorphism analysis among 23 genotypes; cluster analysis of the resulting data grouped genotypes according to known relationships, confirming the usefulness of the developed SSR markers. CONCLUSIONS: The results of our large-scale SSR marker development using tree peony are valuable for investigating plant genomic structural evolution and elucidating phenotypic variation in this species during its evolution and artificial selection. The newly identified SSRs should be useful for genetic linkage map construction, QTL mapping, gene location and cloning, and molecular marker-assisted breeding. In addition, the genome-wide marker resources generated in this study should aid genomic studies of tree peony and related species.
Project description:The availability of large expressed sequence tag (EST) and whole genome databases of oil palm enabled the development of a data base of microsatellite markers. For this purpose, an EST database consisting of 40,979 EST sequences spanning 27?Mb and a chromosome-wise whole genome databases were downloaded. A total of 3,950 primer pairs were identified and developed from EST sequences. The tri and tetra nucleotide repeat motifs were most prevalent (each 24.75%) followed by di-nucleotide repeat motifs. Whole genome-wide analysis found a total of 245,654 SSR repeats across the 16 chromosomes of oil palm, of which 38,717 were compound microsatellite repeats. A web application, OpSatdb, the first microsatellite database of oil palm, was developed using the PHP and MySQL database ( https://ssr.icar.gov.in/index.php ). It is a simple and systematic web-based search engine for searching SSRs based on repeat motif type, repeat type, and primer details. High synteny was observed between oil palm and rice genomes. The mapping of ESTs having SSRs by Blast2GO resulted in the identification of 19.2% sequences with gene ontology (GO) annotations. Randomly, a set of ten genic SSRs and five genomic SSRs were used for validation and genetic diversity on 100 genotypes belonging to the world oil palm genetic resources. The grouping pattern was observed to be broadly in accordance with the geographical origin of the genotypes. The identified genic and genome-wide SSRs can be effectively useful for various genomic applications of oil palm, such as genetic diversity, linkage map construction, mapping of QTLs, marker-assisted selection, and comparative population studies.