Weekly dengue forecasts in Iquitos, Peru; San Juan, Puerto Rico; and Singapore.
ABSTRACT: BACKGROUND:Predictive models can serve as early warning systems and can be used to forecast future risk of various infectious diseases. Conventionally, regression and time series models are used to forecast dengue incidence, using dengue surveillance (e.g., case counts) and weather data. However, these models may be limited in terms of model assumptions and the number of predictors that can be included. Machine learning (ML) methods are designed to work with a large number of predictors and thus offer an appealing alternative. Here, we compared the performance of ML algorithms with that of regression models in predicting dengue cases and outbreaks from 4 to up to 12 weeks in advance. Many countries lack sufficient health surveillance infrastructure, as such we evaluated the contribution of dengue surveillance and weather data on the predictive power of these models. METHODS:We developed ML, regression, and time series models to forecast weekly dengue case counts and outbreaks in Iquitos, Peru; San Juan, Puerto Rico; and Singapore from 1990-2016. Forecasts were generated using available weekly dengue surveillance, and weather data. We evaluated the agreement between model forecasts and actual dengue observations using Mean Absolute Error and Matthew's Correlation Coefficient (MCC). RESULTS:For near term predictions of weekly case counts and when using surveillance data, ML models had 21% and 33% less error than regression and time series models respectively. However, using weather data only, ML models did not demonstrate a practical advantage. When forecasting weekly dengue outbreaks 12 weeks in advance, ML models achieved a maximum MCC of 0.61. CONCLUSIONS:Our results identified 2 scenarios when ML models are advantageous over regression model: 1) predicting dengue weekly case counts 4 weeks ahead when dengue surveillance data are available and 2) predicting weekly dengue outbreaks 12 weeks ahead when dengue surveillance data are unavailable. Given the advantages of ML models, dengue early warning systems may be improved by the inclusion of these models.
Project description:BACKGROUND:Dengue is a re-emerging infectious disease of humans, rapidly growing from endemic areas to dengue-free regions due to favorable conditions. In recent decades, Guangzhou has again suffered from several big outbreaks of dengue; as have its neighboring cities. This study aims to examine the impact of dengue epidemics in Guangzhou, China, and to develop a predictive model for Zhongshan based on local weather conditions and Guangzhou dengue surveillance information. METHODS:We obtained weekly dengue case data from 1st January, 2005 to 31st December, 2014 for Guangzhou and Zhongshan city from the Chinese National Disease Surveillance Reporting System. Meteorological data was collected from the Zhongshan Weather Bureau and demographic data was collected from the Zhongshan Statistical Bureau. A negative binomial regression model with a log link function was used to analyze the relationship between weekly dengue cases in Guangzhou and Zhongshan, controlling for meteorological factors. Cross-correlation functions were applied to identify the time lags of the effect of each weather factor on weekly dengue cases. Models were validated using receiver operating characteristic (ROC) curves and k-fold cross-validation. RESULTS:Our results showed that weekly dengue cases in Zhongshan were significantly associated with dengue cases in Guangzhou after the treatment of a 5 weeks prior moving average (Relative Risk (RR) = 2.016, 95% Confidence Interval (CI): 1.845-2.203), controlling for weather factors including minimum temperature, relative humidity, and rainfall. ROC curve analysis indicated our forecasting model performed well at different prediction thresholds, with 0.969 area under the receiver operating characteristic curve (AUC) for a threshold of 3 cases per week, 0.957 AUC for a threshold of 2 cases per week, and 0.938 AUC for a threshold of 1 case per week. Models established during k-fold cross-validation also had considerable AUC (average 0.938-0.967). The sensitivity and specificity obtained from k-fold cross-validation was 78.83% and 92.48% respectively, with a forecasting threshold of 3 cases per week; 91.17% and 91.39%, with a threshold of 2 cases; and 85.16% and 87.25% with a threshold of 1 case. The out-of-sample prediction for the epidemics in 2014 also showed satisfactory performance. CONCLUSION:Our study findings suggest that the occurrence of dengue outbreaks in Guangzhou could impact dengue outbreaks in Zhongshan under suitable weather conditions. Future studies should focus on developing integrated early warning systems for dengue transmission including local weather and human movement.
Project description:With its tropical rainforest climate, rapid urbanization, and changing demography and ecology, Singapore experiences endemic dengue; the last large outbreak in 2013 culminated in 22,170 cases. In the absence of a vaccine on the market, vector control is the key approach for prevention.We sought to forecast the evolution of dengue epidemics in Singapore to provide early warning of outbreaks and to facilitate the public health response to moderate an impending outbreak.We developed a set of statistical models using least absolute shrinkage and selection operator (LASSO) methods to forecast the weekly incidence of dengue notifications over a 3-month time horizon. This forecasting tool used a variety of data streams and was updated weekly, including recent case data, meteorological data, vector surveillance data, and population-based national statistics. The forecasting methodology was compared with alternative approaches that have been proposed to model dengue case data (seasonal autoregressive integrated moving average and step-down linear regression) by fielding them on the 2013 dengue epidemic, the largest on record in Singapore.Operationally useful forecasts were obtained at a 3-month lag using the LASSO-derived models. Based on the mean average percentage error, the LASSO approach provided more accurate forecasts than the other methods we assessed. We demonstrate its utility in Singapore's dengue control program by providing a forecast of the 2013 outbreak for advance preparation of outbreak response.Statistical models built using machine learning methods such as LASSO have the potential to markedly improve forecasting techniques for recurrent infectious disease outbreaks such as dengue.Shi Y, Liu X, Kok SY, Rajarethinam J, Liang S, Yap G, Chong CS, Lee KS, Tan SS, Chin CK, Lo A, Kong W, Ng LC, Cook AR. 2016. Three-month real-time dengue forecast models: an early warning system for outbreak alerts and policy decision support in Singapore. Environ Health Perspect 124:1369-1375;?http://dx.doi.org/10.1289/ehp.1509981.
Project description:Dengue is an arbovirus affecting global populations. Frequent outbreaks occur, especially in equatorial cities such as Singapore, where year-round tropical climate, large daily influx of travelers and population density provide the ideal conditions for dengue to transmit. Little work has, however, quantified the peaks of dengue outbreaks, when health systems are likely to be most stretched. Nor have methods been developed to infer differences in exogenous factors which lead to the rise and fall of dengue case counts across extreme and non-extreme periods. In this paper, we developed time varying extreme mixture (tvEM) methods to account for the temporal dependence of dengue case counts across extreme and non-extreme periods. This approach permits inference of differences in climatic forcing across non-extreme and extreme periods of dengue case counts, quantification of their temporal dependence as well as estimation of thresholds with associated uncertainties to determine dengue case count extremities. Using tvEM, we found no evidence that weather affects dengue case counts in the near term for non-extreme periods, but that it has non-linear and mixed signals in influencing dengue through tvEM parameters in the extreme periods. Using the most appropriate tvEM specification, we found that a threshold at the 70th (95% credible interval 41.1, 83.8) quantile is optimal, with extreme events of 526.6, 1052.2 and 1183.6 weekly case counts expected at return periods of 5, 50 and 75 years. Weather parameters at a 1% scaled increase was found to decrease the long-run expected case counts, but larger increases would lead to a drastic expected rise from the baseline correspondingly. The tvEM approach can provide valuable inference on the extremes of time series, which in the case of infectious disease notifications, allows public health officials to understand the likely scale of outbreaks in the long run.
Project description:Dengue fever is a viral disease transmitted by mosquitoes. In recent decades, dengue fever has spread throughout the world. In 2014 and 2015, southern Taiwan experienced its most serious dengue outbreak in recent years. Some statistical models have been established in the past, however, these models may not be suitable for predicting huge outbreaks in 2014 and 2015. The control of dengue fever has become the primary task of local health agencies. This study attempts to predict the occurrence of dengue fever in order to achieve the purpose of timely warning. We applied a newly developed autoregressive model (AR model) to assess the association between daily weather variability and daily dengue case number in 2014 and 2015 in Kaohsiung, the largest city in southern Taiwan. This model also contained additional lagged weather predictors, and developed 5-day-ahead and 15-day-ahead predictive models. Our results indicate that numbers of dengue cases in Kaohsiung are associated with humidity and the biting rate (BR). Our model is simple, intuitive and easy to use. The developed model can be embedded in a "real-time" schedule, and the data (at present) can be updated daily or weekly based on the needs of public health workers. In this study, a simple model using only meteorological factors performed well. The proposed real-time forecast model can help health agencies take public health actions to mitigate the influences of the epidemic.
Project description:The robust estimate and forecast capability of random forests (RF) has been widely recognized, however this ensemble machine learning method has not been widely used in mosquito-borne disease forecasting. In this study, two sets of RF models were developed at the national (pooled department-level data) and department level in Colombia to predict weekly dengue cases for 12-weeks ahead. A pooled national model based on artificial neural networks (ANN) was also developed and used as a comparator to the RF models. The various predictors included historic dengue cases, satellite-derived estimates for vegetation, precipitation, and air temperature, as well as population counts, income inequality, and education. Our RF model trained on the pooled national data was more accurate for department-specific weekly dengue cases estimation compared to a local model trained only on the department's data. Additionally, the forecast errors of the national RF model were smaller to those of the national pooled ANN model and were increased with the forecast horizon increasing from one-week-ahead (mean absolute error, MAE: 9.32) to 12-weeks ahead (MAE: 24.56). There was considerable variation in the relative importance of predictors dependent on forecast horizon. The environmental and meteorological predictors were relatively important for short-term dengue forecast horizons while socio-demographic predictors were relevant for longer-term forecast horizons. This study demonstrates the potential of RF in dengue forecasting with a feasible approach of using a national pooled model to forecast at finer spatial scales. Furthermore, including sociodemographic predictors is likely to be helpful in capturing longer-term dengue trends.
Project description:Research is needed to create early warnings of dengue outbreaks to inform stakeholders and control the disease. This analysis composes of a comparative set of prediction models including only meteorological variables; only lag variables of disease surveillance; as well as combinations of meteorological and lag disease surveillance variables. Generalized linear regression models were used to fit relationships between the predictor variables and the dengue surveillance data as outcome variable on the basis of data from 2001 to 2010. Data from 2011 to 2013 were used for external validation purposed of prediction accuracy of the model. Model fit were evaluated based on prediction performance in terms of detecting epidemics, and for number of predicted cases according to RMSE and SRMSE, as well as AIC. An optimal combination of meteorology and autoregressive lag terms of dengue counts in the past were identified best in predicting dengue incidence and the occurrence of dengue epidemics. Past data on disease surveillance, as predictor alone, visually gave reasonably accurate results for outbreak periods, but not for non-outbreaks periods. A combination of surveillance and meteorological data including lag patterns up to a few years in the past showed most predictive of dengue incidence and occurrence in Yogyakarta, Indonesia. The external validation showed poorer results than the internal validation, but still showed skill in detecting outbreaks up to two months ahead. Prior studies support the fact that past meteorology and surveillance data can be predictive of dengue. However, to a less extent has prior research shown how the longer-term past disease incidence data, up to years, can play a role in predicting outbreaks in the coming years, possibly indicating cross-immunity status of the population.
Project description:Dengue and dengue hemorrhagic pose significant burdens in many tropical countries. Dengue incidences have perpetually increased, leading to an annual (uncertain) peak. Dengue cases cause an enormous public health problem in Thailand because there is no anti-viral drug against the dengue virus. Searching for means to reduce the dengue incidences is a challenging and appropriate strategy for primary prevention in a dengue outbreak. This study constructs the best predictive model from past statistical dengue incidences at the provincial level and studies the relationships among dengue incidences and weather variables. We conducted experiments for 65 provinces (out of 77 provinces) in Thailand since there is no dengue information for the remaining provinces. Predictive models were constructed using weekly data during 2001-2014. The training set are data during 2001-2013, and the test set is the data from 2014. Collected data were separated into two parts: current dengue cases as the dependent variable, and weather variables and previous dengue cases as the independent variables. Eight weather variables are used in our models: average pressure, maximum temperature, minimum temperature, average humidity, precipitation, vaporization, wind direction, wind power. Each weather variable includes the current week and one to three weeks of lag time. A total of 32 independent weather variables are used for each province. The previous one to three weeks of dengue cases are also used as independent variables. There is a total of 35 independent variables. Predictive models were constructed using five methods: Poisson regression, negative binomial regression, quasi-likelihood regression, ARIMA(3,1,4) and SARIMA(2,0,1)(0,2,0). The best model is determined by combinations of 1-12 variables, which are 232,989,800 models for each province. We construct a total of 15,144,337,000 models. The best model is selected by the average from high to low of the coefficient of determination (R2) and the lowest root mean square error (RMSE). From our results, the one-week lag previous case variable is the most frequent in 55 provinces out of a total of 65 provinces (coefficient of determinations with a minimum of 0.257 and a maximum of 0.954, average of 0.6383, 95% CI: 0.57313 to 0.70355). The most influential weather variable is precipitation, which is used in most of the provinces, followed by wind direction, wind power, and barometric pressure. The results confirm the common knowledge that dengue incidences occur most often during the rainy season. It also shows that wind direction, wind power, and barometric pressure also have influences on the number of dengue cases. These three weather variables may help adult mosquitos to survive longer and spread dengue. In conclusion, The most influential factor for further cases is the number of dengue cases. However, weather variables are also needed to obtain better results. Predictions of the number of dengue cases should be done locally, not at the national level. The best models of different provinces use different sets of weather variables. Our model has an accuracy that is sufficient for the real prediction of future dengue incidences, to prepare for and protect against severe dengue outbreaks.
Project description:BACKGROUND:Dengue is the fastest spreading vector-borne viral disease, resulting in an estimated 390 million infections annually. Precise prediction of many attributes related to dengue is still a challenge due to the complex dynamics of the disease. Important attributes to predict include: the risk of and risk factors for an infection; infection severity; and the timing and magnitude of outbreaks. In this work, we build a model for predicting the risk of dengue transmission using high-resolution weather data. The level of dengue transmission risk depends on the vector density, hence we predict risk via vector prediction. METHODS AND FINDINGS:We make use of surveillance data on Aedes aegypti larvae collected by the Taiwan Centers for Disease Control as part of the national routine entomological surveillance of dengue, and weather data simulated using the IBM's Containerized Forecasting Workflow, a high spatial- and temporal-resolution forecasting system. We propose a two stage risk prediction system for assessing dengue transmission via Aedes aegypti mosquitoes. In stage one, we perform a logistic regression to determine whether larvae are present or absent at the locations of interest using weather attributes as the explanatory variables. The results are then aggregated to an administrative division, with presence in the division determined by a threshold percentage of larvae positive locations resulting from a bootstrap approach. In stage two, larvae counts are estimated for the predicted larvae positive divisions from stage one, using a zero-inflated negative binomial model. This model identifies the larvae positive locations with 71% accuracy and predicts the larvae numbers producing a coverage probability of 98% over 95% nominal prediction intervals. This two-stage model improves the overall accuracy of identifying larvae positive locations by 29%, and the mean squared error of predicted larvae numbers by 9.6%, against a single-stage approach which uses a zero-inflated binomial regression approach. CONCLUSIONS:We demonstrate a risk prediction system using high resolution weather data can provide valuable insight to the distribution of risk over a geographical region. The work also shows that a two-stage approach is beneficial in predicting risk in non-homogeneous regions, where the risk is localised.
Project description:In recent decades, the global incidence of dengue has increased. Affected countries have responded with more effective surveillance strategies to detect outbreaks early, monitor the trends, and implement prevention and control measures. We have applied newly developed machine learning approaches to identify laboratory-confirmed dengue cases from 4,894 emergency department patients with dengue-like illness (DLI) who received laboratory tests. Among them, 60.11% (2942 cases) were confirmed to have dengue. Using just four input variables [age, body temperature, white blood cells counts (WBCs) and platelets], not only the state-of-the-art deep neural network (DNN) prediction models but also the conventional decision tree (DT) and logistic regression (LR) models delivered performances with receiver operating characteristic (ROC) curves areas under curves (AUCs) of the ranging from 83.75% to 85.87% [for DT, DNN and LR: 84.60% ± 0.03%, 85.87% ± 0.54%, 83.75% ± 0.17%, respectively]. Subgroup analyses found all the models were very sensitive particularly in the pre-epidemic period. Pre-peak sensitivities (<35 weeks) were 92.6%, 92.9%, and 93.1% in DT, DNN, and LR respectively. Adjusted odds ratios examined with LR for low WBCs [? 3.2 (x103/?L)], fever (?38°C), low platelet counts [< 100 (x103/?L)], and elderly (? 65 years) were 5.17 [95% confidence interval (CI): 3.96-6.76], 3.17 [95%CI: 2.74-3.66], 3.10 [95%CI: 2.44-3.94], and 1.77 [95%CI: 1.50-2.10], respectively. Our prediction models can readily be used in resource-poor countries where viral/serologic tests are inconvenient and can also be applied for real-time syndromic surveillance to monitor trends of dengue cases and even be integrated with mosquito/environment surveillance for early warning and immediate prevention/control measures. In other words, a local community hospital/clinic with an instrument of complete blood counts (including platelets) can provide a sentinel screening during outbreaks. In conclusion, the machine learning approach can facilitate medical and public health efforts to minimize the health threat of dengue epidemics. However, laboratory confirmation remains the primary goal of surveillance and outbreak investigation.
Project description:This study identified the possible threshold to predict dengue fever (DF) outbreaks using Baidu Search Index (BSI). Time-series classification and regression tree models based on BSI were used to develop a predictive model for DF outbreak in Guangzhou and Zhongshan, China. In the regression tree models, the mean autochthonous DF incidence rate increased approximately 30-fold in Guangzhou when the weekly BSI for DF at the lagged moving average of 1-3 weeks was more than 382. When the weekly BSI for DF at the lagged moving average of 1-5 weeks was more than 91.8, there was approximately 9-fold increase of the mean autochthonous DF incidence rate in Zhongshan. In the classification tree models, the results showed that when the weekly BSI for DF at the lagged moving average of 1-3 weeks was more than 99.3, there was 89.28% chance of DF outbreak in Guangzhou, while, in Zhongshan, when the weekly BSI for DF at the lagged moving average of 1-5 weeks was more than 68.1, the chance of DF outbreak rose up to 100%. The study indicated that less cost internet-based surveillance systems can be the valuable complement to traditional DF surveillance in China.