Forecasting the monthly incidence rate of brucellosis in west of Iran using time series and data mining from 2010 to 2019.
ABSTRACT: BACKGROUND:The identification of statistical models for the accurate forecast and timely determination of the outbreak of infectious diseases is very important for the healthcare system. Thus, this study was conducted to assess and compare the performance of four machine-learning methods in modeling and forecasting brucellosis time series data based on climatic parameters. METHODS:In this cohort study, human brucellosis cases and climatic parameters were analyzed on a monthly basis for the Qazvin province-located in northwestern Iran- over a period of 9 years (2010-2018). The data were classified into two subsets of education (80%) and testing (20%). Artificial neural network methods (radial basis function and multilayer perceptron), support vector machine and random forest were fitted to each set. Performance analysis of the models were done using the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Root Error (MARE), and R2 criteria. RESULTS:The incidence rate of the brucellosis in Qazvin province was 27.43 per 100,000 during 2010-2019. Based on our results, the values of the RMSE (0.22), MAE (0.175), MARE (0.007) criteria were smaller for the multilayer perceptron neural network than their values in the other three models. Moreover, the R2 (0.99) value was bigger in this model. Therefore, the multilayer perceptron neural network exhibited better performance in forecasting the studied data. The average wind speed and mean temperature were the most effective climatic parameters in the incidence of this disease. CONCLUSIONS:The multilayer perceptron neural network can be used as an effective method in detecting the behavioral trend of brucellosis over time. Nevertheless, further studies focusing on the application and comparison of these methods are needed to detect the most appropriate forecast method for this disease.
Project description:BACKGROUND:Establishing epidemiological models and conducting predictions seems to be useful for the prevention and control of human brucellosis. Autoregressive integrated moving average (ARIMA) models can capture the long-term trends and the periodic variations in time series. However, these models cannot handle the nonlinear trends correctly. Recurrent neural networks can address problems that involve nonlinear time series data. In this study, we intended to build prediction models for human brucellosis in mainland China with Elman and Jordan neural networks. The fitting and forecasting accuracy of the neural networks were compared with a traditional seasonal ARIMA model. METHODS:The reported human brucellosis cases were obtained from the website of the National Health and Family Planning Commission of China. The human brucellosis cases from January 2004 to December 2017 were assembled as monthly counts. The training set observed from January 2004 to December 2016 was used to build the seasonal ARIMA model, Elman and Jordan neural networks. The test set from January 2017 to December 2017 was used to test the forecast results. The root mean squared error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) were used to assess the fitting and forecasting accuracy of the three models. RESULTS:There were 52,868 cases of human brucellosis in Mainland China from January 2004 to December 2017. We observed a long-term upward trend and seasonal variance in the original time series. In the training set, the RMSE and MAE of Elman and Jordan neural networks were lower than those in the ARIMA model, whereas the MAPE of Elman and Jordan neural networks was slightly higher than that in the ARIMA model. In the test set, the RMSE, MAE and MAPE of Elman and Jordan neural networks were far lower than those in the ARIMA model. CONCLUSIONS:The Elman and Jordan recurrent neural networks achieved much higher forecasting accuracy. These models are more suitable for forecasting nonlinear time series data, such as human brucellosis than the traditional ARIMA model.
Project description:Deep learning methods for the prediction of molecular excitation spectra are presented. For the example of the electronic density of states of 132k organic molecules, three different neural network architectures: multilayer perceptron (MLP), convolutional neural network (CNN), and deep tensor neural network (DTNN) are trained and assessed. The inputs for the neural networks are the coordinates and charges of the constituent atoms of each molecule. Already, the MLP is able to learn spectra, but the root mean square error (RMSE) is still as high as 0.3 eV. The learning quality improves significantly for the CNN (RMSE = 0.23 eV) and reaches its best performance for the DTNN (RMSE = 0.19 eV). Both CNN and DTNN capture even small nuances in the spectral shape. In a showcase application of this method, the structures of 10k previously unseen organic molecules are scanned and instant spectra predictions are obtained to identify molecules for potential applications.
Project description:<b>Objectives: </b>Human brucellosis is a public health problem endangering health and property in China. Predicting the trend and the seasonality of human brucellosis is of great significance for its prevention. In this study, a comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more suitable for predicting the occurrence of brucellosis in mainland China.<br><br><b>Design: </b>Time-series study.<br><br><b>Setting: </b>Mainland China.<br><br><b>Methods: </b>Data on human brucellosis in mainland China were provided by the National Health and Family Planning Commission of China. The data were divided into a training set and a test set. The training set was composed of the monthly incidence of human brucellosis in mainland China from January 2008 to June 2018, and the test set was composed of the monthly incidence from July 2018 to June 2019. The mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) were used to evaluate the effects of model fitting and prediction.<br><br><b>Results: </b>The number of human brucellosis patients in mainland China increased from 30?002 in 2008 to 40?328 in 2018. There was an increasing trend and obvious seasonal distribution in the original time series. For the training set, the MAE, RSME and MAPE of the ARIMA(0,1,1)×(0,1,1)<sub>12</sub> model were 338.867, 450.223 and 10.323, respectively, and the MAE, RSME and MAPE of the XGBoost model were 189.332, 262.458 and 4.475, respectively. For the test set, the MAE, RSME and MAPE of the ARIMA(0,1,1)×(0,1,1)<sub>12</sub> model were 529.406, 586.059 and 17.676, respectively, and the MAE, RSME and MAPE of the XGBoost model were 249.307, 280.645 and 7.643, respectively.<br><br><b>Conclusions: </b>The performance of the XGBoost model was better than that of the ARIMA model. The XGBoost model is more suitable for prediction cases of human brucellosis in mainland China.
Project description:Travel time is an important measurement used to evaluate the extent of congestion within road networks. This paper presents a new method to estimate the travel time based on an evolving fuzzy neural inference system. The input variables in the system are traffic flow data (volume, occupancy, and speed) collected from loop detectors located at points both upstream and downstream of a given link, and the output variable is the link travel time. A first order Takagi-Sugeno fuzzy rule set is used to complete the inference. For training the evolving fuzzy neural network (EFNN), two learning processes are proposed: (1) a K-means method is employed to partition input samples into different clusters, and a Gaussian fuzzy membership function is designed for each cluster to measure the membership degree of samples to the cluster centers. As the number of input samples increases, the cluster centers are modified and membership functions are also updated; (2) a weighted recursive least squares estimator is used to optimize the parameters of the linear functions in the Takagi-Sugeno type fuzzy rules. Testing datasets consisting of actual and simulated data are used to test the proposed method. Three common criteria including mean absolute error (MAE), root mean square error (RMSE), and mean absolute relative error (MARE) are utilized to evaluate the estimation performance. Estimation results demonstrate the accuracy and effectiveness of the EFNN method through comparison with existing methods including: multiple linear regression (MLR), instantaneous model (IM), linear model (LM), neural network (NN), and cumulative plots (CP).
Project description:OBJECTIVES:Haemorrhagic fever with renal syndrome (HFRS) is a serious threat to public health in China, accounting for almost 90% cases reported globally. Infectious disease prediction may help in disease prevention despite some uncontrollable influence factors. This study conducted a comparison between a hybrid model and two single models in forecasting the monthly incidence of HFRS in China. DESIGN:Time-series study. SETTING:The People's Republic of China. METHODS:Autoregressive integrated moving average (ARIMA) model, generalised regression neural network (GRNN) model and hybrid ARIMA-GRNN model were constructed by R V.3.4.3 software. The monthly reported incidence of HFRS from January 2011 to May 2018 were adopted to evaluate models' performance. Root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) were adopted to evaluate these models' effectiveness. Spatial stratified heterogeneity of the time series was tested by month and another GRNN model was built with a new series. RESULTS:The monthly incidence of HFRS in the past several years showed a slight downtrend and obvious seasonal variation. A total of four plausible ARIMA models were built and ARIMA(2,1,1) (2,1,1)12 model was selected as the optimal model in HFRS fitting. The smooth factors of the basic GRNN model and the hybrid model were 0.027 and 0.043, respectively. The single ARIMA model was the best in fitting part (MAPE=9.1154, MAE=89.0302, RMSE=138.8356) while the hybrid model was the best in prediction (MAPE=17.8335, MAE=152.3013, RMSE=196.4682). GRNN model was revised by building model with new series and the forecasting performance of revised model (MAPE=17.6095, MAE=163.8000, RMSE=169.4751) was better than original GRNN model (MAPE=19.2029, MAE=177.0356, RMSE=202.1684). CONCLUSIONS:The hybrid ARIMA-GRNN model was better than single ARIMA and basic GRNN model in forecasting monthly incidence of HFRS in China. It could be considered as a decision-making tool in HFRS prevention and control.
Project description:Hospital crowding is a rising problem, effective predicting and detecting managment can helpful to reduce crowding. Our team has successfully proposed a hybrid model combining both the autoregressive integrated moving average (ARIMA) and the nonlinear autoregressive neural network (NARNN) models in the schistosomiasis and hand, foot, and mouth disease forecasting study. In this paper, our aim is to explore the application of the hybrid ARIMA-NARNN model to track the trends of the new admission inpatients, which provides a methodological basis for reducing crowding.We used the single seasonal ARIMA (SARIMA), NARNN and the hybrid SARIMA-NARNN model to fit and forecast the monthly and daily number of new admission inpatients. The root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) were used to compare the forecasting performance among the three models. The modeling time range of monthly data included was from January 2010 to June 2016, July to October 2016 as the corresponding testing data set. The daily modeling data set was from January 4 to September 4, 2016, while the testing time range included was from September 5 to October 2, 2016.For the monthly data, the modeling RMSE and the testing RMSE, MAE and MAPE of SARIMA-NARNN model were less than those obtained from the single SARIMA or NARNN model, but the MAE and MAPE of modeling performance of SARIMA-NARNN model did not improve. For the daily data, all RMSE, MAE and MAPE of NARNN model were the lowest both in modeling stage and testing stage.Hybrid model does not necessarily outperform its constituents' performances. It is worth attempting to explore the reliable model to forecast the number of new admission inpatients from different data.
Project description:Alpha-galactosidase production in submerged fermentation by Acinetobacter sp. was optimized using feed forward neural networks and genetic algorithm (FFNN-GA). Six different parameters, pH, temperature, agitation speed, carbon source (raffinose), nitrogen source (tryptone), and K2HPO4, were chosen and used to construct 6-10-1 topology of feed forward neural network to study interactions between fermentation parameters and enzyme yield. The predicted values were further optimized by genetic algorithm (GA). The predictability of neural networks was further analysed by using mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and R2-value for training and testing data. Using hybrid neural networks and genetic algorithm, alpha-galactosidase production was improved from 7.5?U/mL to 10.2?U/mL.
Project description:BACKGROUND:Data collected by an actigraphy device worn on the wrist or waist can provide objective measurements for studies related to physical activity; however, some data may contain intervals where values are missing. In previous studies, statistical methods have been applied to impute missing values on the basis of statistical assumptions. Deep learning algorithms, however, can learn features from the data without any such assumptions and may outperform previous approaches in imputation tasks. OBJECTIVE:The aim of this study was to impute missing values in data using a deep learning approach. METHODS:To develop an imputation model for missing values in accelerometer-based actigraphy data, a denoising convolutional autoencoder was adopted. We trained and tested our deep learning-based imputation model with the National Health and Nutrition Examination Survey data set and validated it with the external Korea National Health and Nutrition Examination Survey and the Korean Chronic Cerebrovascular Disease Oriented Biobank data sets which consist of daily records measuring activity counts. The partial root mean square error and partial mean absolute error of the imputed intervals (partial RMSE and partial MAE, respectively) were calculated using our deep learning-based imputation model (zero-inflated denoising convolutional autoencoder) as well as using other approaches (mean imputation, zero-inflated Poisson regression, and Bayesian regression). RESULTS:The zero-inflated denoising convolutional autoencoder exhibited a partial RMSE of 839.3 counts and partial MAE of 431.1 counts, whereas mean imputation achieved a partial RMSE of 1053.2 counts and partial MAE of 545.4 counts, the zero-inflated Poisson regression model achieved a partial RMSE of 1255.6 counts and partial MAE of 508.6 counts, and Bayesian regression achieved a partial RMSE of 924.5 counts and partial MAE of 605.8 counts. CONCLUSIONS:Our deep learning-based imputation model performed better than the other methods when imputing missing values in actigraphy data.
Project description:BACKGROUND:Mapping of patient-reported outcomes to the five-dimension EuroQol (EQ-5D) health index is increasingly being used for understanding the relationship of outcomes to health states and for predicting utilities that have application in economic evaluations. The 12-item Multiple Sclerosis Walking Scale (MSWS-12) is a patient-reported outcome that assesses the impact of walking impairment in people with MS. An equation for mapping the MSWS-12 to the EQ-5D was previously developed and validated using a North American Research Committee on MS (NARCOMS) registry cohort. MATERIALS AND METHODS:This analysis retested the validity of the equation mapping the MSWS-12 to the three-level EQ-5D (EQ-5D-3L) by using an independent cohort of patients with MS enrolled in a randomized controlled trial. Mapping was evaluated at two separate time points (baseline and week 4) during the clinical trial. The mapping equation's performance was subsequently assessed with mean absolute error (MAE) and root-mean-square error (RMSE) by comparing equation-based estimates to values elicited in the trial using the actual EQ-5D-3L questionnaire. RESULTS:The mapping equation predicted EQ-5D-3L values in this external cohort with reasonable precision at both time points (MAE 0.116 and RMSE 0.155 at baseline; MAE 0.105 and RMSE 0.138 at week 4), and was similar to that reported in the original NARCOMS cohort (MAE 0.109 and RMSE 0.145). Also as observed in the original NARCOMS cohort, the mapping equation performed best in patients with EQ-5D-3L values between 0.50 and 0.75, and poorly in patients with values <0.50. CONCLUSION:The mapping equation performed similarly in this external cohort as in the original derivation cohort, including a poorer performance in MS patients with more severe health-state severity.
Project description:The objective of this research was to develop a methodology for optimizing multilayer-perceptron-type neural networks by evaluating the effects of three neural architecture parameters, namely, number of hidden layers (HL), neurons per hidden layer (NHL), and activation function type (AF), on the sum of squares error (SSE). The data for the study were obtained from quality parameters (physicochemical and microbiological) of milk samples. Architectures or combinations were organized in groups (G1, G2, and G3) generated upon interspersing one, two, and three layers. Within each group, the networks had three neurons in the input layer, six neurons in the output layer, three to twenty-seven NHL, and three AF (tan-sig, log-sig, and linear) types. The number of architectures was determined using three factorial-type experimental designs, which reached 63, 2 187, and 50 049 combinations for G1, G2 and G3, respectively. Using MATLAB 2015a, a logical sequence was designed and implemented for constructing, training, and evaluating multilayer-perceptron-type neural networks using parallel computing techniques. The results show that HL and NHL have a statistically relevant effect on SSE, and from two hidden layers, AF also has a significant effect; thus, both AF and NHL can be evaluated to determine the optimal combination per group. Moreover, in the three study groups, it is observed that there is an inverse relationship between the number of processors and the total optimization time.