Project description:Understanding accurate methods for predicting yields in complex agricultural systems is critical for effective nutrient management and crop growth. Machine learning has proven to be an important tool in this context. Numerous studies have investigated its potential for predicting yields under different conditions. Among these algorithms, Random Forest (RF) has gained prominence due to its ability to manage large data sets with high dimensions, as well as its ability to uncover complicated non-linear relationships and interactions between variables. RF is particularly suitable for scenarios with categorical variables and missing data. Given the complex web of management practices and their nonlinear effects on yield prediction, it is important to investigate new machine learning algorithms. In this context, our study focused on the evaluation of gradient boosting methods, particularly Extreme Gradient Boosting (XGB) and Gradient Boosting Regressor (GBR), as potential candidates for yield estimation of the maize hybrid Zhengdan 958. Our aim was not only to evaluate and compare these algorithms with existing approaches, but also to comprehensively analyze the resulting model uncertainties. Our approach includes comparing multiple machine learning algorithms, developing and selecting suitable features, fine-tuning the models by training and adjusting the hyperparameters, and visualizing the results. Using a recent dataset of over 1700 maize yield data pairs, our evaluation included a spectrum of algorithms. Our results show robust prediction accuracy for all algorithms. In particular, the predictions of XGB (RMSE = 0.37, R2 = 0.87 and MAE = 0.26) and GBR(RMSE = 0.39, R2 = 0.86 and MAE = 0.27), emphasized the central role of weather characteristics and confirmed the high dependence of crop yield prediction on environmental attributes. Utilizing the capabilities of gradient boosting for yield prediction holds immense potential and is consistent with the promise of this method to serve as a catalyst for further investigation in this evolving field.
Project description:Due to the increased demand for sunflower production, its breeding assignment is the intensification of the development of highly productive oil seed hybrids to satisfy the edible oil industry. Sunflower Oil Yield Prediction (SOYP) can help breeders to identify desirable new hybrids with high oil yield and their characteristics using machine learning (ML) algorithms. In this study, we developed ML models to predict oil yield using two sets of features. Moreover, we evaluated the most relevant features for accurate SOYP. ML algorithms that were used and compared were Artificial Neural Network (ANN), Support Vector Regression, K-Nearest Neighbour, and Random Forest Regressor (RFR). The dataset consisted of samples for 1250 hybrids of which 70% were randomly selected and were used to train the model and 30% were used to test the model and assess its performance. Employing MAE, MSE, RMSE and R2 evaluation metrics, RFR consistently outperformed in all datasets, achieving a peak of 0.92 for R2 in 2019. In contrast, ANN recorded the lowest MAE, reaching 65 in 2018 The paper revealed that in addition to seed yield, the following characteristics of hybrids were important for SOYP: resistance to broomrape (Or) and downy mildew (Pl) and maturity. It was also disclosed that the locality feature could be used for the estimation of sunflower oil yield but it is highly dependable on weather conditions that affect the oil content and seed yield. Up to our knowledge, this is the first study in which ML was used for sunflower oil yield prediction. The obtained results indicate that ML has great potential for application in oil yield prediction, but also selection of parental lines for hybrid production, RFR algorithm was found to be the most effective and along with locality feature is going to be further evaluated as an alternative method for genotypic selection.
Project description:Recently, a rapid advancement in using unmanned aerial vehicles (UAVs) for yield prediction (YP) has led to many YP research findings. This study aims to visualize the intellectual background, research progress, knowledge structure, and main research frontiers of the entire YP domain for main cereal crops using VOSviewer and a comprehensive literature review. To develop visualization networks of UAVs related knowledge for YP of wheat, maize, rice, and soybean (WMRS) crops, the original research articles published between January 2001 and August 2023 were retrieved from the web of science core collection (WOSCC) database. Significant contributors have been observed to the growth of YP-related research, including the most active countries, prolific publications, productive writers and authors, the top contributing institutions, influential journals, papers, and keywords. Furthermore, the study observed the primary contributions of YP for WMRS crops using UAVs at the micro, meso, and macro levels and the degree of collaboration and information sources for YP. Moreover, the policy assistance from the People's Republic of China, the United States of America, Germany, and Australia considerably advances the knowledge of UAVs connected to YP of WMRS crops, revealed under investigation of grants and collaborating nations. Lastly, the findings of WMRS crops for YP are presented regarding the data type, algorithms, results, and study location. The remote sensing community can significantly benefit from this study by being able to discriminate between the most critical sub-domains of the YP literature for WMRS crops utilizing UAVs and to recommend new research frontiers for concentrating on the essential directions for subsequent studies.
Project description:Reliable data on biomass produced by lignocellulosic bioenergy crops are essential to identify sustainable bioenergy sources. Field studies have been performed for decades on bioenergy crops, but only a small proportion of the available data is used to explore future land use scenarios including bioenergy crops. A global dataset of biomass production for key lignocellulosic bioenergy crops is thus needed to disentangle the factors impacting biomass production in different regions. Such dataset will be also useful to develop and assess bioenergy crop modelling in integrated assessment socio-economic models and global vegetation models. Here, we compiled and described a global biomass yield dataset based on field measurements. We extracted 5,088 entries of data from 257 published studies for five main lingocellulosic bioenergy crops: eucalypt, Miscanthus, poplar, switchgrass, and willow. Data are from 355 geographic sites in 31 countries around the world. We also documented the species, plantation practices, climate conditions, soil property, and managements. Our dataset can be used to identify productive bioenergy species over a large range of environments.
Project description:Relationships between species diversity, productivity, temporal stability of productivity, and plant invasion have been well documented in grasslands, and these relationships could translate to improved agricultural sustainability. However, few studies have explored these relationships in agricultural contexts where fertility and weeds are managed. Using 7 years of biomass yield and species composition data from 12 species mixture treatments varying in native species diversity, we found that species richness increased yield and interannual yield stability by reducing weed abundance. Stability was driven by yield as opposed to temporal variability of yield. Nitrogen fertilization increased yield but at the expense of yield stability. We show how relationships between diversity, species asynchrony, invasion, productivity, and stability observed in natural grasslands can extend into managed agricultural systems. Increasing bioenergy crop diversity can improve farmer economics via increased yield, reduced yield variability, and reduced inputs for weed control, thus promoting perennial vegetation on agricultural lands.
Project description:In commercial dairy farms, mastitis is associated with increased antimicrobial use and associated resistance, which may affect milk production. This study aimed to develop sensor-based prediction models for naturally occurring clinical bovine mastitis using nine machine learning algorithms with data from 447 mastitic and 2146 healthy cows obtained from five commercial farms in Northeast China. The variables were related to daily activity, rumination time, and daily milk yield of cows, as well as milk electrical conductivity. Both Z-standardized and non-standardized datasets pertaining to four specific stages of lactation were used to train and test prediction models. For all four subgroups, the Z-standardized dataset yielded better results than those of the non-standardized one, with the multilayer artificial neural net algorithm showing the best performance. Variables of importance had a similar rank in this algorithm, indicating the consistency of these variables as predictors for bovine mastitis in commercial farms with similar automatic systems. Moreover, the peak milk yield (PMY) of mastitic cows was significantly higher than that of healthy cows (p < 0.005), indicating that high-yielding cattle are more prone to mastitis. Our results show that machine learning algorithms are effective tools for predicting mastitis in dairy cows for immediate intervention and management in commercial farms.
Project description:This dataset supports the research paper "Cover crop effects on maize drought stress and yield" by Hunter et al. [1]. Data is provided on ecophysiological and yield measurements of maize grown following five functionally diverse cover crop treatments. The experiment was conducted in Pennsylvania, USA from 2013-2015 with organic management. Cover crops were planted in August after winter wheat harvest. Cover crops were terminated in late May of the following year, manure was applied, and both were incorporated with full inversion tillage prior to planting maize. The five cover crop treatments included a tilled fallow control, medium red clover, cereal rye, forage radish, and a 3-species mixture of medium red clover, cereal rye, and Austrian winter pea. Drought was imposed with rain exclusion shelters starting in early July. Results are provided for two subplots per cover crop treatment representing ambient and drought conditions. The dataset includes: 1) soil moisture in spring and during the maize growing season; 2) maize height, leaf chlorophyll content, leaf area index, stomatal conductance, and pre-dawn leaf xylem water potential; 3) maize yield and yield components including kernel biomass, total biomass, harvest index, number of plants per subplot, ears per plant, kernel mass, and kernel number per ear, per plant, and per subplot; 4) modeled season-long radiation interception and radiation use efficiency of biomass production; and 5) maize rooting density by depth in one year only. Data was collected in the field and lab using ecophysiological instruments (e.g., SPAD meter, ceptometer, porometer, and pressure chamber). Biomass samples were taken to determine yield. Data presented have been averaged to the subplot level (ambient and drought). This dataset can inform future research focused on using cover crops and other cultural practices to improve climate adaptation in cropping systems and also may be useful for meta-analyses.
Project description:One of the greatest challenges in sustainable agricultural production is managing ecosystem services, such as pollination, in ways that maximize crop yields. Most efforts to increase services by wild pollinators focus on management of natural habitats surrounding farms or non-crop habitats within farms. However, mass flowering crops create resource pulses that may be important determinants of pollinator dynamics. Mass bloom attracts pollinators and it is unclear how this affects the pollination and yields of other co-blooming crops. We investigated the effects of mass flowering apple on the pollinator community and yield of co-blooming strawberry on farms spanning a gradient in cover of apple orchards in the landscape. The effect of mass flowering apple on strawberry was dependent on the stage of apple bloom. During early and peak apple bloom, pollinator abundance and yield were reduced in landscapes with high cover of apple orchards. Following peak apple bloom, pollinator abundance was greater on farms with high apple cover and corresponded with increased yields on these farms. Spatial and temporal overlap between mass flowering and co-blooming crops alters the strength and direction of these dynamics and suggests that yields can be optimized by designing agricultural systems that avoid competition while maximizing facilitation.
Project description:BackgroundRNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters.ResultsWe present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived).ConclusionsLarge datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.
Project description:BACKGROUND AND OBJECTIVE:Most previous studies adopted single traditional time series models to predict incidences of malaria. A single model cannot effectively capture all the properties of the data structure. However, a stacking architecture can solve this problem by combining distinct algorithms and models. This study compares the performance of traditional time series models and deep learning algorithms in malaria case prediction and explores the application value of stacking methods in the field of infectious disease prediction. METHODS:The ARIMA, STL+ARIMA, BP-ANN and LSTM network models were separately applied in simulations using malaria data and meteorological data in Yunnan Province from 2011 to 2017. We compared the predictive performance of each model through evaluation measures: RMSE, MASE, MAD. In addition, gradient-boosting regression trees (GBRTs) were used to combine the above four models. We also determined whether stacking structure improved the model prediction performance. RESULTS:The root mean square errors (RMSEs) of the four sub-models were 13.176, 14.543, 9.571 and 7.208; the mean absolute scaled errors (MASEs) were 0.469, 0.472, 0.296 and 0.266 and the mean absolute deviation (MAD) were 6.403, 7.658, 5.871 and 5.691. After using the stacking architecture combined with the above four models, the RMSE, MASE and MAD values of the ensemble model decreased to 6.810, 0.224 and 4.625, respectively. CONCLUSIONS:A novel ensemble model based on the robustness of structured prediction and model combination through stacking was developed. The findings suggest that the predictive performance of the final model is superior to that of the other four sub-models, indicating that stacking architecture may have significant implications in infectious disease prediction.