Project description:Developing high yielding rice varieties that are tolerant to drought stress is crucial for the sustainable livelihood of rice farmers in rainfed rice cropping ecosystems. Genomic selection (GS) promises to be an effective breeding option for these complex traits. We evaluated the effectiveness of two rather new options in the implementation of GS: trait and environment-specific marker selection and the use of multi-environment prediction models. A reference population of 280 rainfed lowland accessions endowed with 215k SNP markers data was phenotyped under a favorable and two managed drought environments. Trait-specific SNP subsets (28k) were selected for each trait under each environment, using results of GWAS performed with the complete genotype dataset. Performances of single-environment and multi-environment genomic prediction models were compared using kernel regression based methods (GBLUP and RKHS) under two cross validation scenarios: availability (CV2) or not (CV1) of phenotypic data for the validation set, in one of the environments. Trait-specific marker selection strategy achieved predictive ability (PA) of genomic prediction up to 22% higher than markers selected on the bases of neutral linkage disequilibrium (LD). Tolerance to drought stress was up to 32% better predicted by multi-environment models (especially RKHS based models) under CV2 strategy. Under the less favorable CV1 strategy, the multi-environment models achieved similar PA than the single-environment predictions. We also showed that reasonable PA could be obtained with as few as 3,000 SNP markers, even in a population of low LD extent, provided marker selection is based on pairwise LD. The implications of these findings for breeding for drought tolerance are discussed. The most resource sparing option would be accurate phenotyping of the reference population in a favorable environment and under a managed drought, while the candidate population would be phenotyped only under one of those environments.
Project description:Genomic selection uses whole-genome marker models to predict phenotypes or genetic values for complex traits. Some of these models fit interaction terms between markers, and are therefore called epistatic. The biological interpretation of the corresponding fitted effects is not straightforward and there is the threat of overinterpreting their functional meaning. Here we show that the predictive ability of epistatic models relative to additive models can change with the density of the marker panel. In more detail, we show that for publicly available Arabidopsis and rice datasets, an initial superiority of epistatic models over additive models, which can be observed at a lower marker density, vanishes when the number of markers increases. We relate these observations to earlier results reported in the context of association studies which showed that detecting statistical epistatic effects may not only be related to interactions in the underlying genetic architecture, but also to incomplete linkage disequilibrium at low marker density ("Phantom Epistasis"). Finally, we illustrate in a simulation study that due to phantom epistasis, epistatic models may also predict the genetic value of an underlying purely additive genetic architecture better than additive models, when the marker density is low. Our observations can encourage the use of genomic epistatic models with low density panels, and discourage their biological over-interpretation.
Project description:Multi-trait (MT) genomic prediction models enable breeders to save phenotyping resources and increase the prediction accuracy of unobserved target traits by exploiting available information from non-target or auxiliary traits. Our study evaluated different MT models using 250 rice accessions from Asian countries genotyped and phenotyped for grain content of zinc (Zn), iron (Fe), copper (Cu), manganese (Mn), and cadmium (Cd). The predictive performance of MT models compared to a traditional single trait (ST) model was assessed by 1) applying different cross-validation strategies (CV1, CV2, and CV3) inferring varied phenotyping patterns and budgets; 2) accounting for local epistatic effects along with the main additive effect in MT models; and 3) using a selective marker panel composed of trait-associated SNPs in MT models. MT models were not statistically significantly (p < 0.05) superior to ST model under CV1, where no phenotypic information was available for the accessions in the test set. After including phenotypes from auxiliary traits in both training and test sets (MT-CV2) or simply in the test set (MT-CV3), MT models significantly (p < 0.05) outperformed ST model for all the traits. The highest increases in the predictive ability of MT models relative to ST models were 11.1% (Mn), 11.5 (Cd), 33.3% (Fe), 95.2% (Cu) and 126% (Zn). Accounting for the local epistatic effects using a haplotype-based model further improved the predictive ability of MT models by 4.6% (Cu), 3.8% (Zn), and 3.5% (Cd) relative to MT models with only additive effects. The predictive ability of the haplotype-based model was not improved after optimizing the marker panel by only considering the markers associated with the traits. This study first assessed the local epistatic effects and marker optimization strategies in the MT genomic prediction framework and then illustrated the power of the MT model in predicting trace element traits in rice for the effective use of genetic resources to improve the nutritional quality of rice grain.
Project description:Key messageGenomic prediction models for multi-year dry matter yield, via genotyping-by-sequencing in a composite training set, demonstrate potential for genetic gain improvement through within-half sibling family selection. Perennial ryegrass (Lolium perenne L.) is a key source of nutrition for ruminant livestock in temperate environments worldwide. Higher seasonal and annual yield of herbage dry matter (DMY) is a principal breeding objective but the historical realised rate of genetic gain for DMY is modest. Genomic selection was investigated as a tool to enhance the rate of genetic gain. Genotyping-by-sequencing (GBS) was undertaken in a multi-population (MP) training set of five populations, phenotyped as half-sibling (HS) families in five environments over 2 years for mean herbage accumulation (HA), a measure of DMY potential. GBS using the ApeKI enzyme yielded 1.02 million single-nucleotide polymorphism (SNP) markers from a training set of n = 517. MP-based genomic prediction models for HA were effective in all five populations, cross-validation-predictive ability (PA) ranging from 0.07 to 0.43, by trait and target population, and 0.40-0.52 for days-to-heading. Best linear unbiased predictor (BLUP)-based prediction methods, including GBLUP with either a standard or a recently developed (KGD) relatedness estimation, were marginally superior or equal to ridge regression and random forest computational approaches. PA was principally an outcome of SNP modelling genetic relationships between training and validation sets, which may limit application for long-term genomic selection, due to PA decay. However, simulation using data from the training experiment indicated a twofold increase in genetic gain for HA, when applying a prediction model with moderate PA in a single selection cycle, by combining among-HS family selection, based on phenotype, with within-HS family selection using genomic prediction.
Project description:Forage nutritive value impacts animal nutrition, which underpins livestock productivity, reproduction and health. Genetic improvement for nutritive traits in perennial ryegrass has been limited, as they are typically expensive and time-consuming to measure through conventional methods. Genomic selection is appropriate for such complex and expensive traits, enabling cost-effective prediction of breeding values using genome-wide markers. The aims of the present study were to assess the potential of genomic selection for a range of nutritive traits in a multi-population training set, and to quantify contributions of family, location and family-by-location variance components to trait variation and heritability for nutritive traits. The training set consisted of a total of 517 half-sibling (half-sib) families, from five advanced breeding populations, evaluated in two distinct New Zealand grazing environments. Autumn-harvested samples were analyzed for 18 nutritive traits and maternal parents of the half-sib families were genotyped using genotyping-by-sequencing. Significant (P < 0.05) family variance was detected for all nutritive traits and genomic heritability (h2 g ) was moderate to high (0.20 to 0.74). Family-by-location interactions were significant and particularly large for water soluble carbohydrate (WSC), crude fat, phosphorus (P) and crude protein. GBLUP, KGD-GBLUP and BayesCπ genomic prediction models displayed similar predictive ability, estimated by 10-fold cross validation, for all nutritive traits with values ranging from r = 0.16 to 0.45 using phenotypes from across two locations. High predictive ability was observed for the mineral traits sulfur (0.44), sodium (0.45) and magnesium (0.45) and the lowest values were observed for P (0.16), digestibility (0.22) and high molecular weight WSC (0.23). Predictive ability estimates for most nutritive traits were retained when marker number was reduced from one million to as few as 50,000. The moderate to high predictive abilities observed suggests implementation of genomic selection is feasible for most of the nutritive traits examined.
Project description:This reproducibility study presents an algorithm to weigh in race distribution data of clinical research study samples when training biomedical embeddings. We extracted 12,864 PubMed abstracts published between January 1st, 2000 and January 1st, 2022 and weighed them based on the race distribution data extracted from their corresponding clinical trials registered on ClinicalTrials.gov. We trained Word2vec and BERT embeddings and evaluated their performance on predicting length of hospital stay (LHS) and intensive care unit (ICU) readmission using MIMIC-IV electronic health record data. We observed that models trained using race-sensitive embeddings do not consistently outperform the neutral embeddings ones when used for LHS prediction (with similar Mean Absolute Error 1.975 vs. 2.008) or ICU readmission prediction (with similar accuracy 74.61% vs. 75.17% and the same AUC 0.775), respectively. We conclude that demographic sensitive embeddings do not necessarily significantly improve the accuracy of health predictive models as previously reported in the literature.
Project description:Efficient breeding and selection of superior genotypes requires a comprehensive understanding of the genetics of traits. This study was aimed at establishing the general combining ability (GCA), specific combining ability (SCA), and heritability of sweetpotato weevil (Cylas spp.) resistance, storage root yield, and dry matter content in a sweetpotato multi-parental breeding population. A population of 1,896 F1 clones obtained from an 8 × 8 North Carolina II design cross was evaluated with its parents in the field at two sweetpotato weevil hotspots in Uganda, using an augmented row-column design. Clone roots were further evaluated in three rounds of a no-choice feeding laboratory bioassay. Significant GCA effects for parents and SCA effects for families were observed for most traits and all variance components were highly significant (p ≤ 0.001). Narrow-sense heritability estimates for weevil severity, storage root yield, and dry matter content were 0.35, 0.36, and 0.45, respectively. Parental genotypes with superior GCA for weevil resistance included "Mugande," NASPOT 5, "Dimbuka-bukulula," and "Wagabolige." On the other hand, families that displayed the highest levels of resistance to weevils included "Wagabolige" × NASPOT 10 O, NASPOT 5 × "Dimbuka-bukulula," "Mugande" × "Dimbuka-bukulula," and NASPOT 11 × NASPOT 7. The moderate levels of narrow-sense heritability observed for the traits, coupled with the significant GCA and SCA effects, suggest that there is potential for their improvement through conventional breeding via hybridization and progeny selection and advancement. Although selection for weevil resistance may, to some extent, be challenging for breeders, efforts could be boosted through applying genomics-assisted breeding. Superior parents and families identified through this study could be deployed in further research involving the genetic improvement of these traits.
Project description:Implementation of genomic tools is desirable to increase the efficiency of apple breeding. Recently, the multi-environment apple reference population (apple REFPOP) proved useful for rediscovering loci, estimating genomic predictive ability, and studying genotype by environment interactions (G × E). So far, only two phenological traits were investigated using the apple REFPOP, although the population may be valuable when dissecting genetic architecture and reporting predictive abilities for additional key traits in apple breeding. Here we show contrasting genetic architecture and genomic predictive abilities for 30 quantitative traits across up to six European locations using the apple REFPOP. A total of 59 stable and 277 location-specific associations were found using GWAS, 69.2% of which are novel when compared with 41 reviewed publications. Average genomic predictive abilities of 0.18-0.88 were estimated using main-effect univariate, main-effect multivariate, multi-environment univariate, and multi-environment multivariate models. The G × E accounted for up to 24% of the phenotypic variability. This most comprehensive genomic study in apple in terms of trait-environment combinations provided knowledge of trait biology and prediction models that can be readily applied for marker-assisted or genomic selection, thus facilitating increased breeding efficiency.
Project description:Genomic prediction provides an efficient alternative to conventional phenotypic selection for developing improved cultivars with desirable characteristics. New and improved methods to genomic prediction are continually being developed that attempt to deal with the integration of data types beyond genomic information. Modern automated weather systems offer the opportunity to capture continuous data on a range of environmental parameters at specific field locations. In principle, this information could characterize training and target environments and enhance predictive ability by incorporating weather characteristics as part of the genotype-by-environment (G×E) interaction component in prediction models. We assessed the usefulness of including weather data variables in genomic prediction models using a naïve environmental kinship model across 30 environments comprising the Genomes to Fields (G2F) initiative in 2014 and 2015. Specifically four different prediction scenarios were evaluated (i) tested genotypes in observed environments; (ii) untested genotypes in observed environments; (iii) tested genotypes in unobserved environments; and (iv) untested genotypes in unobserved environments. A set of 1,481 unique hybrids were evaluated for grain yield. Evaluations were conducted using five different models including main effect of environments; general combining ability (GCA) effects of the maternal and paternal parents modeled using the genomic relationship matrix; specific combining ability (SCA) effects between maternal and paternal parents; interactions between genetic (GCA and SCA) effects and environmental effects; and finally interactions between the genetics effects and environmental covariates. Incorporation of the genotype-by-environment interaction term improved predictive ability across all scenarios. However, predictive ability was not improved through inclusion of naive environmental covariates in G×E models. More research should be conducted to link the observed weather conditions with important physiological aspects in plant development to improve predictive ability through the inclusion of weather data.
Project description:Microplastics, which have been frequently detected worldwide, are strong adsorbents for organic pollutants and may alter their environmental behavior and toxicity in the environment. To completely state the risk of microplastics and their coexisting organics, the adsorption behavior of microplastics is a critical issue that needs to be clarified. Thus, the microplastic/water partition coefficient (log Kd) of organics was investigated by in silico method here. Five log Kd predictive models were developed for the partition of organics in polyethylene/seawater, polyethylene/freshwater, polyethylene/pure water, polypropylene/seawater, and polystyrene/seawater. The statistical results indicate that the established models have good robustness and predictive ability. Analyzing the descriptors selected by different models finds that hydrophobic interaction is the main adsorption mechanism, and π-π interaction also plays a crucial role for the microplastics containing benzene rings. Hydrogen bond basicity and cavity formation energy of compounds can determine their partition tendency. The distinct crystallinity and aromaticity make different microplastics exhibit disparate adsorption carrying ability. Environmental medium with high salinity can enhance the adsorption of organics and microplastics by increasing their induced dipole effect. The models developed in this study can not only be used to estimate the log Kd values, but also provide some necessary mechanism information for the further risk studies of microplastics.