Project description:The availability of well-assembled genome sequences and reduced sequencing costs have enabled the resequencing of many additional accessions in several crops, thus facilitating the rapid discovery and development of simple sequence repeat (SSR) markers. Although the genome sequence of inbred spinach line Sp75 is available, previous efforts have resulted in a limited number of useful SSR markers. Identification of additional polymorphic SSR markers will support genetics and breeding research in spinach. This study aimed to use the available genomic resources to mine and catalog a large number of polymorphic SSR markers. A search for SSR loci on six chromosome sequences of spinach line Sp75 using GMATA identified a total of 42,155 loci with repeat motifs of two to six nucleotides in the Sp75 reference genome. Whole-genome sequences (30x) of additional 21 accessions were aligned against the chromosome sequences of the reference genome and in silico genotyped using the HipSTR program by comparing and counting repeat numbers variation across the SSR loci among the accessions. The HipSTR program generated SSR genotype data were filtered for monomorphic and high missing loci, and a final set of the 5986 polymorphic SSR loci were identified. The polymorphic SSR loci were present at a density of 12.9 SSRs/Mb and were physically mapped. Out of 36 randomly selected SSR loci for validation, two failed to amplify, while the remaining were all polymorphic in a set of 48 spinach accessions from 34 countries. Genetic diversity analysis performed using the SSRs allele score data on the 48 spinach accessions showed three main population groups. This strategy to mine and develop polymorphic SSR markers by a comparative analysis of the genome sequences of multiple accessions and computational genotyping of the candidate SSR loci eliminates the need for laborious experimental screening. Our approach increased the efficiency of discovering a large set of novel polymorphic SSR markers, as demonstrated in this report.

Project description:BackgroundMaize (Zea mays ssp. mays L.), as the most important plant for staple food of several million people, animal feed and bioenergy productions, is widely cultivated around the world. Simple sequence repeats (SSRs) are widely used as molecular markers in maize genetics and breeding, but only two thousands pairs of SSRs have been published currently, which hardly satisfies for the increasing needs of geneticists and breeders. Furthermore, the increasing studies have revealed that SSRs also play a vital role in functional regulation and evolution. It is fortunate that the development of sequencing technology and bio-software provides the basis for characterization and development of SSRs in maize.ResultsIn this study, MISA was applied to identify overall 179,681 SSRs in maize reference genome B73, with an average distance of 11.46 Kbp. Their distributions within the genome in different regions were non-random, and the density followed in a descending order of UTR, promotor, intron, intergenic and CDS. Meanwhile, 82,694 (46.02%) SSRs with unique flanking sequences were selected, and then applied to analyze the polymorphism of next-generation sequencing data from 345 maize inbred lines and data from maize reference genome B73. There were 58,946 SSRs with length information results in ten or more than ten genomes, accounting for 71.28% of SSRs with unique flanking sequences, while 55,621 SSRs had polymorphism, with an average PIC value of 0.498. 250 pairs of SSR primers in different genomic regions covering all maize chromosomes were randomly chosen for the experimental validation, with an average PIC value of 0.63 in 11 elite maize inbred lines.ConclusionsOur work provided insight into the non-random distribution spatterns and compositions of SSRs in different regions of maize genome, and also developed more polymorphic SSR markers using next-generation sequencing reads. The genome-wide SSRs polymorphism markers could be useful for genetic analysis and marker-assisted selection in breeding practice, and it was also proved to be high efficient for molecular marker development via next-generation sequencing reads.

Project description:BackgroundCucumber, Cucumis sativus L. is an important vegetable crop worldwide. Until very recently, cucumber genetic and genomic resources, especially molecular markers, have been very limited, impeding progress of cucumber breeding efforts. Microsatellites are short tandemly repeated DNA sequences, which are frequently favored as genetic markers due to their high level of polymorphism and codominant inheritance. Data from previously characterized genomes has shown that these repeats vary in frequency, motif sequence, and genomic location across taxa. During the last year, the genomes of two cucumber genotypes were sequenced including the Chinese fresh market type inbred line '9930' and the North American pickling type inbred line 'Gy14'. These sequences provide a powerful tool for developing markers in a large scale. In this study, we surveyed and characterized the distribution and frequency of perfect microsatellites in 203 Mbp assembled Gy14 DNA sequences, representing 55% of its nuclear genome, and in cucumber EST sequences. Similar analyses were performed in genomic and EST data from seven other plant species, and the results were compared with those of cucumber.ResultsA total of 112,073 perfect repeats were detected in the Gy14 cucumber genome sequence, accounting for 0.9% of the assembled Gy14 genome, with an overall density of 551.9 SSRs/Mbp. While tetranucleotides were the most frequent microsatellites in genomic DNA sequence, dinucleotide repeats, which had more repeat units than any other SSR type, had the highest cumulative sequence length. Coding regions (ESTs) of the cucumber genome had fewer microsatellites compared to its genomic sequence, with trinucleotides predominating in EST sequences. AAG was the most frequent repeat in cucumber ESTs. Overall, AT-rich motifs prevailed in both genomic and EST data. Compared to the other species examined, cucumber genomic sequence had the highest density of SSRs (although comparable to the density of poplar, grapevine and rice), and was richest in AT dinucleotides. Using an electronic PCR strategy, we investigated the polymorphism between 9930 and Gy14 at 1,006 SSR loci, and found unexpectedly high degree of polymorphism (48.3%) between the two genotypes. The level of polymorphism seems to be positively associated with the number of repeat units in the microsatellite. The in silico PCR results were validated empirically in 660 of the 1,006 SSR loci. In addition, primer sequences for more than 83,000 newly-discovered cucumber microsatellites, and their exact positions in the Gy14 genome assembly were made publicly available.ConclusionsThe cucumber genome is rich in microsatellites; AT and AAG are the most abundant repeat motifs in genomic and EST sequences of cucumber, respectively. Considering all the species investigated, some commonalities were noted, especially within the monocot and dicot groups, although the distribution of motifs and the frequency of certain repeats were characteristic of the species examined. The large number of SSR markers developed from this study should be a significant contribution to the cucurbit research community.

Dataset Information

Genome-wide development of simple sequence repeats markers and genetic diversity analysis of chayote

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets