Project description:New sources of genetic diversity must be incorporated into plant breeding programs if they are to continue increasing grain yield and quality, and tolerance to abiotic and biotic stresses. Germplasm collections provide a source of genetic and phenotypic diversity, but characterization of these resources is required to increase their utility for breeding programs. We used a barley SNP iSelect platform with 7,842 SNPs to genotype 2,417 barley accessions sampled from the USDA National Small Grains Collection of 33,176 accessions. Most of the accessions in this core collection are categorized as landraces or cultivars/breeding lines and were obtained from more than 100 countries. Both STRUCTURE and principal component analysis identified five major subpopulations within the core collection, mainly differentiated by geographical origin and spike row number (an inflorescence architecture trait). Different patterns of linkage disequilibrium (LD) were found across the barley genome and many regions of high LD contained traits involved in domestication and breeding selection. The genotype data were used to define 'mini-core' sets of accessions capturing the majority of the allelic diversity present in the core collection. These 'mini-core' sets can be used for evaluating traits that are difficult or expensive to score. Genome-wide association studies (GWAS) of 'hull cover', 'spike row number', and 'heading date' demonstrate the utility of the core collection for locating genetic factors determining important phenotypes. The GWAS results were referenced to a new barley consensus map containing 5,665 SNPs. Our results demonstrate that GWAS and high-density SNP genotyping are effective tools for plant breeders interested in accessing genetic diversity in large germplasm collections.
Project description:Germplasm collections are a crucial resource to conserve natural genetic diversity and provide a source of novel traits essential for sustained crop improvement. Optimal collection, preservation and utilization of these materials depends upon knowledge of the genetic variation present within the collection. Here we use the high-throughput genotyping-by-sequencing (GBS) technology to characterize the United States National Plant Germplasm System (NPGS) collection of cucumber (Cucumis sativus L.). The GBS data, derived from 1234 cucumber accessions, provided more than 23 K high-quality single-nucleotide polymorphisms (SNPs) that are well distributed at high density in the genome (~1 SNP/10.6 kb). The SNP markers were used to characterize genetic diversity, population structure, phylogenetic relationships, linkage disequilibrium, and population differentiation of the NPGS cucumber collection. These results, providing detailed genetic analysis of the U.S. cucumber collection, complement NPGS descriptive information regarding geographic origin and phenotypic characterization. We also identified genome regions significantly associated with 13 horticulturally important traits through genome-wide association studies (GWAS). Finally, we developed a molecularly informed, publicly accessible core collection of 395 accessions that represents at least 96% of the genetic variation present in the NPGS. Collectively, the information obtained from the GBS data enabled deep insight into the diversity present and genetic relationships among accessions within the collection, and will provide a valuable resource for genetic analyses, gene discovery, crop improvement, and germplasm preservation.
Project description:Sweetpotato (Ipomoea batatas) is the sixth most important food crop and plays a critical role in maintaining food security worldwide. Support for sweetpotato improvement research in breeding and genetics programs, and maintenance of sweetpotato germplasm collections is essential for preserving food security for future generations. Germplasm collections seek to preserve phenotypic and genotypic diversity through accession characterization. However, due to its genetic complexity, high heterogeneity, polyploid genome, phenotypic plasticity, and high flower production variability, sweetpotato genetic characterization is challenging. Here, we characterize the genetic diversity and population structure of 604 accessions from the sweetpotato germplasm collection maintained by the United States Department of Agriculture (USDA), Agricultural Research Service (ARS), Plant Genetic Resources Conservation Unit (PGRCU) in Griffin, Georgia, United States. Using the genotyping-by-sequencing platform (GBSpoly) and bioinformatic pipelines (ngsComposer and GBSapp), a total of 102,870 polymorphic SNPs with hexaploid dosage calls were identified from the 604 accessions. Discriminant analysis of principal components (DAPC) and Bayesian clustering identified six unique genetic groupings across seven broad geographic regions. Genetic diversity analyses using the hexaploid data set revealed ample genetic diversity among the analyzed collection in concordance with previous analyses. Following population structure and diversity analyses, breeder germplasm subsets of 24, 48, 96, and 384 accessions were established using K-means clustering with manual selection to maintain phenotypic and genotypic diversity. The genetic characterization of the PGRCU sweetpotato germplasm collection and breeder germplasm subsets developed in this study provide the foundation for future association studies and serve as precursors toward phenotyping studies aimed at linking genotype with phenotype.
Project description:ObjectiveChamaecrista fasciculata is a widespread annual legume across Eastern North America, with potential as a restoration planting, biofuel crop, and genetic model for non-papillinoid legumes. As a non-Papilinoid, C. fasciculata, belongs to the Caesalpiniod group in which nodulation likely arose independently of the nodulation in Papilinoid and Mimosoid legumes. Thus, C. fasciculata is an attractive model system for legume evolution. In this study, we describe population structure and genetic diversity among 32 USDA germplasm accessions of C. fasciculata using 317 AFLP markers developed from 12 primer pairs, to assess where geographically there is the most genetic variation.ResultsWe found that the C. fasciculata germplasm collection fall into four clusters with admixture among them. After correcting for outliers, our analysis shows two primary groups across Eastern and Central North America. To better understand the population biology of this species, further sampling of the full range of this widespread species is needed across North America, as well as the development of a larger set of markers providing denser coverage of the genome. Further sampling will help clarify geographical relationships in this widespread temperate species.
Project description:Phenotypic evaluation and efficient utilization of germplasm collections can be time-intensive, laborious, and expensive. However, with the plummeting costs of next-generation sequencing and the addition of genomic selection to the plant breeder's toolbox, we now can more efficiently tap the genetic diversity within large germplasm collections. In this study, we applied and evaluated genomic prediction's potential to a set of 482 pea (Pisum sativum L.) accessions-genotyped with 30,600 single nucleotide polymorphic (SNP) markers and phenotyped for seed yield and yield-related components-for enhancing selection of accessions from the USDA Pea Germplasm Collection. Genomic prediction models and several factors affecting predictive ability were evaluated in a series of cross-validation schemes across complex traits. Different genomic prediction models gave similar results, with predictive ability across traits ranging from 0.23 to 0.60, with no model working best across all traits. Increasing the training population size improved the predictive ability of most traits, including seed yield. Predictive abilities increased and reached a plateau with increasing number of markers presumably due to extensive linkage disequilibrium in the pea genome. Accounting for population structure effects did not significantly boost predictive ability, but we observed a slight improvement in seed yield. By applying the best genomic prediction model (e.g., RR-BLUP), we then examined the distribution of genotyped but nonphenotyped accessions and the reliability of genomic estimated breeding values (GEBV). The distribution of GEBV suggested that none of the nonphenotyped accessions were expected to perform outside the range of the phenotyped accessions. Desirable breeding values with higher reliability can be used to identify and screen favorable germplasm accessions. Expanding the training set and incorporating additional orthogonal information (e.g., transcriptomics, metabolomics, physiological traits, etc.) into the genomic prediction framework can enhance prediction accuracy.
Project description:BackgroundConservation of genetic diversity is an essential prerequisite for developing new cultivars with desirable agronomic traits. Although a large number of germplasm collections have been established worldwide, many of them face major difficulties due to large size and a lack of adequate information about population structure and genetic diversity. Core collection with a minimum number of accessions and maximum genetic diversity of pepper species and its wild relatives will facilitate easy access to genetic material as well as the use of hidden genetic diversity in Capsicum.ResultsTo explore genetic diversity and population structure, we investigated patterns of molecular diversity using a transcriptome-based 48 single nucleotide polymorphisms (SNPs) in a large germplasm collection comprising 3,821 accessions. Among the 11 species examined, Capsicum annuum showed the highest genetic diversity (HE = 0.44, I = 0.69), whereas the wild species C. galapagoense showed the lowest genetic diversity (HE = 0.06, I = 0.07). The Capsicum germplasm collection was divided into 10 clusters (cluster 1 to 10) based on population structure analysis, and five groups (group A to E) based on phylogenetic analysis. Capsicum accessions from the five distinct groups in an unrooted phylogenetic tree showed taxonomic distinctness and reflected their geographic origins. Most of the accessions from European countries are distributed in the A and B groups, whereas the accessions from Asian countries are mainly distributed in C and D groups. Five different sampling strategies with diverse genetic clustering methods were used to select the optimal method for constructing the core collection. Using a number of allelic variations based on 48 SNP markers and 32 different phenotypic/morphological traits, a core collection 'CC240' with a total of 240 accessions (5.2 %) was selected from within the entire Capsicum germplasm. Compared to the other core collections, CC240 displayed higher genetic diversity (I = 0.95) and genetic evenness (J' = 0.80), and represented a wider range of phenotypic variation (MD = 9.45 %, CR = 98.40 %).ConclusionsA total of 240 accessions were selected from 3,821 Capsicum accessions based on transcriptome-based 48 SNP markers with genome-wide distribution and 32 traits using a systematic approach. This core collection will be a primary resource for pepper breeders and researchers for further genetic association and functional analyses.
Project description:The Mediterranean sesame core collection contains agro-morphologically superior sesame accessions from geographically diverse regions in four continents. In the present investigation, the genetic diversity and population structure of this collection was analyzed with 5292 high-quality SNPs discovered by double-digest restriction site associated DNA (ddRAD) sequencing, a cost-effective and flexible next-generation sequencing method. The genetic distance between pairs of accessions varied from 0.023 to 0.524. The gene diversity was higher in accessions from Asia than from America, Africa, and Europe. The highest genetic differentiation was observed between accessions collected from America and Europe. Structure analysis showed the presence of three subpopulations among the sesame accessions, and only six accessions were placed in an admixture group. Phylogenetic tree and principal coordinate analysis clustered the accessions based on their countries of origin. However, no clear division was evident among the sesame accessions with regard to their continental locations. This result was supported by an AMOVA analysis, which revealed a genetic variation among continental groups of 5.53% of the total variation. The large number of SNPs clearly indicated that the Mediterranean sesame core collection is a highly diverse genetic resource. The collection can be exploited by breeders to select appropriate accessions that will provide high genetic gain in sesame improvement programs. The high-quality SNP data generated here should also be used in genome-wide association studies to explore qualitative trait loci and SNPs related to economically and agronomically important traits in sesame.
Project description:Olives are one of the most important fruit and woody oil trees cultivated in many parts of the world. Olive oil is a critical component of the Mediterranean diet due to its importance in heart health. Olives are believed to have been brought to the United States from the Mediterranean countries in the 18th century. Despite the increase in demand and production areas, only a few selected olive varieties are grown in most traditional or new growing regions in the US. By understanding the genetic background, new sources of genetic diversity can be incorporated into the olive breeding programs to develop regionally adapted varieties for the US market. This study aimed to explore the genetic diversity and population structure of 90 olive accessions from the USDA repository along with six popular varieties using genotyping-by-sequencing (GBS)-generated SNP markers. After quality filtering, 54,075 SNP markers were retained for the genetic diversity analysis. The average gene diversity (GD) and polymorphic information content (PIC) values of the SNPs were 0.244 and 0.206, respectively, indicating a moderate genetic diversity for the US olive germplasm evaluated in this study. The structure analysis showed that the USDA collection was distributed across seven subpopulations; 63% of the accessions were grouped into an identifiable subpopulation. The phylogenetic and principal coordinate analysis (PCoA) showed that the subpopulations did not align with the geographical origins or climatic zones. An analysis of the molecular variance revealed that the major genetic variation sources were within populations. These findings provide critical information for future olive breeding programs to select genetically distant parents and facilitate future gene identification using genome-wide association studies (GWAS) or a marker-assisted selection (MAS) to develop varieties suited to production in the US.
Project description:BackgroundThe economic importance of grapevine has driven significant efforts in genomics to accelerate the exploitation of Vitis resources for development of new cultivars. However, although a large number of clonally propagated accessions are maintained in grape germplasm collections worldwide, their use for crop improvement is limited by the scarcity of information on genetic diversity, population structure and proper phenotypic assessment. The identification of representative and manageable subset of accessions would facilitate access to the diversity available in large collections. A genome-wide germplasm characterization using molecular markers can offer reliable tools for adjusting the quality and representativeness of such core samples.ResultsWe investigated patterns of molecular diversity at 22 common microsatellite loci and 384 single nucleotide polymorphisms (SNPs) in 2273 accessions of domesticated grapevine V. vinifera ssp. sativa, its wild relative V. vinifera ssp. sylvestris, interspecific hybrid cultivars and rootstocks. Despite the large number of putative duplicates and extensive clonal relationships among the accessions, we observed high level of genetic variation. In the total germplasm collection the average genetic diversity, as quantified by the expected heterozygosity, was higher for SSR loci (0.81) than for SNPs (0.34). The analysis of the genetic structure in the grape germplasm collection revealed several levels of stratification. The primary division was between accessions of V. vinifera and non-vinifera, followed by the distinction between wild and domesticated grapevine. Intra-specific subgroups were detected within cultivated grapevine representing different eco-geographic groups. The comparison of a phenological core collection and genetic core collections showed that the latter retained more genetic diversity, while maintaining a similar phenotypic variability.ConclusionsThe comprehensive molecular characterization of our grape germplasm collection contributes to the knowledge about levels and distribution of genetic diversity in the existing resources of Vitis and provides insights into genetic subdivision within the European germplasm. Genotypic and phenotypic information compared in this study may efficiently guide further exploration of this diversity for facilitating its practical use.
Project description:Sweetpotato (Ipomoea batatas) plays a critical role in food security and is the most important root crop worldwide following potatoes and cassava. In the United States (US), it is valued at over $700 million USD. There are two sweetpotato germplasm collections (Plant Genetic Resources Conservation Unit and US Vegetable Laboratory) maintained by the USDA, ARS for sweetpotato crop improvement. To date, no genome-wide assessment of genetic diversity within these collections has been reported in the published literature. In our study, population structure and genetic diversity of 417 USDA sweetpotato accessions originating from 8 broad geographical regions (Africa, Australia, Caribbean, Central America, Far East, North America, Pacific Islands, and South America) were determined using single nucleotide polymorphisms (SNPs) identified with a genotyping-by-sequencing (GBS) protocol, GBSpoly, optimized for highly heterozygous and polyploid species. Population structure using Bayesian clustering analyses (STRUCTURE) with 32,784 segregating SNPs grouped the accessions into four genetic groups and indicated a high degree of mixed ancestry. A neighbor-joining cladogram and principal components analysis based on a pairwise genetic distance matrix of the accessions supported the population structure analysis. Pairwise FST values between broad geographical regions based on the origin of accessions ranged from 0.017 (Far East - Pacific Islands) to 0.110 (Australia - South America) and supported the clustering of accessions based on genetic distance. The markers developed for use with this collection of accessions provide an important genomic resource for the sweetpotato community, and contribute to our understanding of the genetic diversity present within the US sweetpotato collection and the species.