Genomic prediction applied to high-biomass sorghum for bioenergy production.
ABSTRACT: The increasing cost of energy and finite oil and gas reserves have created a need to develop alternative fuels from renewable sources. Due to its abiotic stress tolerance and annual cultivation, high-biomass sorghum (Sorghum bicolor L. Moench) shows potential as a bioenergy crop. Genomic selection is a useful tool for accelerating genetic gains and could restructure plant breeding programs by enabling early selection and reducing breeding cycle duration. This work aimed at predicting breeding values via genomic selection models for 200 sorghum genotypes comprising landrace accessions and breeding lines from biomass and saccharine groups. These genotypes were divided into two sub-panels, according to breeding purpose. We evaluated the following phenotypic biomass traits: days to flowering, plant height, fresh and dry matter yield, and fiber, cellulose, hemicellulose, and lignin proportions. Genotyping by sequencing yielded more than 258,000 single-nucleotide polymorphism markers, which revealed population structure between subpanels. We then fitted and compared genomic selection models BayesA, BayesB, BayesC?, BayesLasso, Bayes Ridge Regression and random regression best linear unbiased predictor. The resulting predictive abilities varied little between the different models, but substantially between traits. Different scenarios of prediction showed the potential of using genomic selection results between sub-panels and years, although the genotype by environment interaction negatively affected accuracies. Functional enrichment analyses performed with the marker-predicted effects suggested several interesting associations, with potential for revealing biological processes relevant to the studied quantitative traits. This work shows that genomic selection can be successfully applied in biomass sorghum breeding programs.
Project description:We compare genomic selection methods that use correlated traits to help predict biomass yield in sorghum, and find that trait-assisted genomic selection performs best. Genomic selection (GS) is usually performed on a single trait, but correlated traits can also help predict a focal trait through indirect or multi-trait GS. In this study, we use a pre-breeding population of biomass sorghum to compare strategies that use correlated traits to improve prediction of biomass yield, the focal trait. Correlated traits include moisture, plant height measured at monthly intervals between planting and harvest, and the area under the growth progress curve. In addition to single- and multi-trait direct and indirect GS, we test a new strategy called trait-assisted GS, in which correlated traits are used along with marker data in the validation population to predict a focal trait. Single-trait GS for biomass yield had a prediction accuracy of 0.40. Indirect GS performed best using area under the growth progress curve to predict biomass yield, with a prediction accuracy of 0.37, and did not differ from indirect multi-trait GS that also used moisture information. Multi-trait GS and single-trait GS yielded similar results, indicating that correlated traits did not improve prediction of biomass yield in a standard GS scenario. However, trait-assisted GS increased prediction accuracy by up to [Formula: see text] when using plant height in both the training and validation populations to help predict yield in the validation population. Coincidence between selected genotypes in phenotypic and genomic selection was also highest in trait-assisted GS. Overall, these results suggest that trait-assisted GS can be an efficient strategy when correlated traits are obtained earlier or more inexpensively than a focal trait.
Project description:Sorghum (Sorghum bicolor L.) is a major food cereal for millions of people worldwide. The sorghum genome, like other species, accumulates deleterious mutations, likely impacting its fitness. The lack of recombination, drift, and the coupling with favorable loci impede the removal of deleterious mutations from the genome by selection. To study how deleterious variants impact phenotypes, we identified putative deleterious mutations among ?5.5 M segregating variants of 229 diverse biomass sorghum lines. We provide the whole-genome estimate of the deleterious burden in sorghum, showing that ?33% of nonsynonymous substitutions are putatively deleterious. The pattern of mutation burden varies appreciably among racial groups. Across racial groups, the mutation burden correlated negatively with biomass, plant height, specific leaf area (SLA), and tissue starch content (TSC), suggesting that deleterious burden decreases trait fitness. Putatively deleterious variants explain roughly one-half of the genetic variance. However, there is only moderate improvement in total heritable variance explained for biomass (7.6%) and plant height (average of 3.1% across all stages). There is no advantage in total heritable variance for SLA and TSC. The contribution of putatively deleterious variants to phenotypic diversity therefore appears to be dependent on the genetic architecture of traits. Overall, these results suggest that incorporating putatively deleterious variants into genomic models slightly improves prediction accuracy because of extensive linkage. Knowledge of deleterious variants could be leveraged for sorghum breeding through either genome editing and/or conventional breeding that focuses on the selection of progeny with fewer deleterious alleles.
Project description:The efficient use of sorghum as a renewable energy source requires high biomass yields and reduced agricultural inputs. Hybridization of <i>Sorghum bicolor</i> with wild <i>Sorghum halepense</i> can help meet both requirements, generating high-yielding and environment friendly perennial sorghum cultivars. Selection efficiency, however, needs to be improved to exploit the genetic potential of the derived recombinant lines and remove weedy and other wild traits. In this work, we present the results from a Genome-Wide Association Study conducted on a diversity panel made up of <i>S. bicolor</i> and an advanced population derived from <i>S. bicolor</i> × <i>S. halepense</i> multi-parent crosses. The objective was to identify genetic loci controlling biomass yield and biomass-relevant traits for breeding purposes. Plants were phenotyped during four consecutive years for dry biomass yield, dry mass fraction of fresh material, plant height and plant maturity. A genotyping-by-sequencing approach was implemented to obtain 92,383 high quality SNP markers used in this work. Significant marker-trait associations were uncovered across eight of the ten sorghum chromosomes, with two main hotspots near the end of chromosomes 7 and 9, in proximity of dwarfing genes <i>Dw1</i> and <i>Dw3</i>. No significant marker was found on chromosomes 2 and 4. A large number of significant marker loci associated with biomass yield and biomass-relevant traits showed minor effects on respective plant characteristics, with the exception of seven loci on chromosomes 3, 8, and 9 that explained 5.2-7.8% of phenotypic variability in dry mass yield, dry mass fraction of fresh material, and maturity, and a major effect (<i>R</i> <sup>2</sup> = 16.2%) locus on chromosome 1 for dry mass fraction of fresh material which co-localized with a zinc-finger homeodomain protein possibly involved in the expression of the <i>D</i> (Dry stalk) locus. These markers and marker haplotypes identified in this work are expected to boost marker-assisted selection in sorghum breeding.
Project description:Switchgrass (Panicum virgatum L.) is a perennial grass undergoing development as a biofuel feedstock. One of the most important factors hindering breeding efforts in this species is the need for accurate measurement of biomass yield on a per-hectare basis. Genomic selection on simple-to-measure traits that approximate biomass yield has the potential to significantly speed up the breeding cycle. Recent advances in switchgrass genomic and phenotypic resources are now making it possible to evaluate the potential of genomic selection of such traits. We leveraged these resources to study the ability of three widely-used genomic selection models to predict phenotypic values of morphological and biomass quality traits in an association panel consisting of predominantly northern adapted upland germplasm. High prediction accuracies were obtained for most of the traits, with standability having the highest ten-fold cross validation prediction accuracy (0.52). Moreover, the morphological traits generally had higher prediction accuracies than the biomass quality traits. Nevertheless, our results suggest that the quality of current genomic and phenotypic resources available for switchgrass is sufficiently high for genomic selection to significantly impact breeding efforts for biomass yield.
Project description:BACKGROUND:Sorghum (Sorghum bicolor) is one of the most important cereal crops globally and a potential energy plant for biofuel production. In order to explore genetic gain for a range of important quantitative traits, such as drought and heat tolerance, grain yield, stem sugar accumulation, and biomass production, via the use of molecular breeding and genomic selection strategies, knowledge of the available genetic variation and the underlying sequence polymorphisms, is required. RESULTS:Based on the assembled and annotated genome sequences of Sorghum bicolor (v2.1) and the recently published sorghum re-sequencing data, ~62.9 M SNPs were identified among 48 sorghum accessions and included in a newly developed sorghum genome SNP database SorGSD (http://sorgsd.big.ac.cn). The diverse panel of 48 sorghum lines can be classified into four groups, improved varieties, landraces, wild and weedy sorghums, and a wild relative Sorghum propinquum. SorGSD has a web-based query interface to search or browse SNPs from individual accessions, or to compare SNPs among several lines. The query results can be visualized as text format in tables, or rendered as graphics in a genome browser. Users may find useful annotation from query results including type of SNPs such as synonymous or non-synonymous SNPs, start, stop of splice variants, chromosome locations, and links to the annotation on Phytozome (www.phytozome.net) sorghum genome database. In addition, general information related to sorghum research such as online sorghum resources and literature references can also be found on the website. All the SNP data and annotations can be freely download from the website. CONCLUSIONS:SorGSD is a comprehensive web-portal providing a database of large-scale genome variation across all racial types of cultivated sorghum and wild relatives. It can serve as a bioinformatics platform for a range of genomics and molecular breeding activities for sorghum and for other C4 grasses.
Project description:Fillet yield (FY) and harvest weight (HW) are economically important traits in Nile tilapia production. Genetic improvement of these traits, especially for FY, are lacking, due to the absence of efficient methods to measure the traits without sacrificing fish and the use of information from relatives to selection. However, genomic information could be used by genomic selection to improve traits that are difficult to measure directly in selection candidates, as in the case of FY. The objectives of this study were: (i) to perform genome-wide association studies (GWAS) to dissect the genetic architecture of FY and HW, (ii) to evaluate the accuracy of genotype imputation and (iii) to assess the accuracy of genomic selection using true and imputed low-density (LD) single nucleotide polymorphism (SNP) panels to determine a cost-effective strategy for practical implementation of genomic information in tilapia breeding programs. The data set consisted of 5,866 phenotyped animals and 1,238 genotyped animals (108 parents and 1,130 offspring) using a 50K SNP panel. The GWAS were performed using all genotyped and phenotyped animals. The genotyped imputation was performed from LD panels (LD0.5K, LD1K and LD3K) to high-density panel (HD), using information from parents and 20% of offspring in the reference set and the remaining 80% in the validation set. In addition, we tested the accuracy of genomic selection using true and imputed genotypes comparing the accuracy obtained from pedigree-based best linear unbiased prediction (PBLUP) and genomic predictions. The results from GWAS supports evidence of the polygenic nature of FY and HW. The accuracy of imputation ranged from 0.90 to 0.98 for LD0.5K and LD3K, respectively. The accuracy of genomic prediction outperformed the estimated breeding value from PBLUP. The use of imputation for genomic selection resulted in an increased relative accuracy independent of the trait and LD panel analyzed. The present results suggest that genotype imputation could be a cost-effective strategy for genomic selection in Nile tilapia breeding programs.
Project description:Sweet sorghum [Sorghum bicolor (L.) Moench] is a type of cultivated sorghum characterized by the accumulation of high levels of sugar in the stems and high biomass accumulation, making this crop an important feedstock for bioenergy production. Sweet sorghum breeding programs that focus on bioenergy have two main goals: to improve quantity and quality of sugars in the juicy stem and to increase fresh biomass productivity. Genetic diversity studies are very important for the success of a breeding program, especially in the early stages, where understanding the genetic relationship between accessions is essential to identify superior parents for the development of improved breeding lines. The objectives of this study were: to perform phenotypic and molecular characterization of 100 sweet sorghum accessions from the germplasm bank of the Embrapa Maize and Sorghum breeding program; to examine the relationship between the phenotypic and the molecular diversity matrices; and to infer about the population structure in the sweet sorghum accessions. Morphological and agro-industrial traits related to sugar and biomass production were used for phenotypic characterization, and single nucleotide polymorphisms (SNPs) were used for molecular diversity analysis. Both phenotypic and molecular characterizations revealed the existence of considerable genetic diversity among the 100 sweet sorghum accessions. The correlation between the phenotypic and the molecular diversity matrices was low (0.35), which is in agreement with the inconsistencies observed between the clusters formed by the phenotypic and the molecular diversity analyses. Furthermore, the clusters obtained by the molecular diversity analysis were more consistent with the genealogy and the historic background of the sweet sorghum accessions than the clusters obtained through the phenotypic diversity analysis. The low correlation observed between the molecular and the phenotypic diversity matrices highlights the complementarity between the molecular and the phenotypic characterization to assist a breeding program.
Project description:BACKGROUND:Sorghum (Sorghum bicolor) is globally produced as a source of food, feed, fiber and fuel. Grain and sweet sorghums differ in a number of important traits, including stem sugar and juice accumulation, plant height as well as grain and biomass production. The first whole genome sequence of a grain sorghum is available, but additional genome sequences are required to study genome-wide and intraspecific variation for dissecting the genetic basis of these important traits and for tailor-designed breeding of this important C4 crop. RESULTS:We resequenced two sweet and one grain sorghum inbred lines, and identified a set of nearly 1,500 genes differentiating sweet and grain sorghum. These genes fall into ten major metabolic pathways involved in sugar and starch metabolisms, lignin and coumarin biosynthesis, nucleic acid metabolism, stress responses and DNA damage repair. In addition, we uncovered 1,057,018 SNPs, 99,948 indels of 1 to 10 bp in length and 16,487 presence/absence variations as well as 17,111 copy number variations. The majority of the large-effect SNPs, indels and presence/absence variations resided in the genes containing leucine rich repeats, PPR repeats and disease resistance R genes possessing diverse biological functions or under diversifying selection, but were absent in genes that are essential for life. CONCLUSIONS:This is a first report of the identification of genome-wide patterns of genetic variation in sorghum. High-density SNP and indel markers reported here will be a valuable resource for future gene-phenotype studies and the molecular breeding of this important crop and related species.
Project description:Genomic selection increases the rate of genetic gain in breeding programs, which results in significant cumulative improvements in commercially important traits such as disease resistance. Genomic selection currently relies on collecting genome-wide genotype data accross a large number of individuals, which requires substantial economic investment. However, global aquaculture production predominantly occurs in small and medium sized enterprises for whom this technology can be prohibitively expensive. For genomic selection to benefit these aquaculture sectors, more cost-efficient genotyping is necessary. In this study the utility of low and medium density SNP panels (ranging from 100 to 9,000 SNPs) to accurately predict breeding values was tested and compared in four aquaculture datasets with different characteristics (species, genome size, genotyping platform, family number and size, total population size, and target trait). The traits show heritabilities between 0.19-0.49, and genomic prediction accuracies using the full density panel of 0.55-0.87. A consistent pattern of genomic prediction accuracy was observed across species with little or no accuracy reduction until SNP density was reduced below 1,000 SNPs (prediction accuracies of 0.44-0.75). Below this SNP density, heritability estimates and genomic prediction accuracies tended to be lower and more variable (93% of maximum accuracy achieved with 1,000 SNPs, 89% with 500 SNPs, and 70% with 100 SNPs). A notable drop in accuracy was observed between 200 SNP panels (0.44-0.75) and 100 SNP panels (0.39-0.66). Now that a multitude of studies have highlighted the benefits of genomic over pedigree-based prediction of breeding values in aquaculture species, the results of the current study highlight that these benefits can be achieved at lower SNP densities and at lower cost, raising the possibility of a broader application of genetic improvement in smaller and more fragmented aquaculture settings.
Project description:<h4>Background</h4>Miscanthus has potential as a biomass crop but the development of varieties that are consistently superior to the natural hybrid M. × giganteus has been challenging, presumably because of strong G × E interactions and poor knowledge of the complex genetic architectures of traits underlying biomass productivity and climatic adaptation. While linkage and association mapping studies are starting to generate long lists of candidate regions and even individual genes, it seems unlikely that this information can be translated into effective marker-assisted selection for the needs of breeding programmes. Genomic selection has emerged as a viable alternative, and prediction accuracies are moderate across a range of phenological and morphometric traits in Miscanthus, though relatively low for biomass yield per se.<h4>Methods</h4>We have previously proposed a combination of index selection and genomic prediction as a way of overcoming the limitations imposed by the inherent complexity of biomass yield. Here we extend this approach and illustrate its potential to achieve multiple breeding targets simultaneously, in the absence of a priori knowledge about their relative economic importance, while also monitoring correlated selection responses for non-target traits. We evaluate two hypothetical scenarios of increasing biomass yield by 20 % within a single round of selection. In the first scenario, this is achieved in combination with delaying flowering by 44 d (roughly 20 %), whereas, in the second, increased yield is targeted jointly with reduced lignin (-5 %) and increased cellulose (+5 %) content, relative to current average levels in the breeding population.<h4>Key results</h4>In both scenarios, the objectives were achieved efficiently (selection intensities corresponding to keeping the best 20 and 4 % of genotypes, respectively). However, the outcomes were strikingly different in terms of correlated responses, and the relative economic values (i.e. value per unit of change in each trait compared with that for biomass yield) of secondary traits included in selection indices varied considerably.<h4>Conclusions</h4>Although these calculations rely on multiple assumptions, they highlight the need to evaluate breeding objectives and explicitly consider correlated responses in silico, prior to committing extensive resources. The proposed approach is broadly applicable for this purpose and can readily incorporate high-throughput phenotyping data as part of integrated breeding platforms.