Project description:As most disease causing pathogens require transmission from an infectious individual to a susceptible individual, continued persistence of the pathogen within the population requires the replenishment of susceptibles through births, immigration, or waning immunity. Consider the introduction of an unknown infectious disease into a fully susceptible population where it is not known how long immunity is conferred once an individual recovers from infection. If, initially, the prevalence of disease increases (that is, the infection takes off), the number of infectives will usually decrease to a low level after the first major outbreak. During this post-outbreak period, the disease dynamics may be influenced by stochastic effects and there is a non-zero probability that the epidemic will die out. Die out in this period following the first major outbreak is known as an epidemic fade-out. If the disease does not die out, the susceptible population may be replenished by the waning of immunity, and a second wave may start. In this study, we investigate if the rate of waning immunity (and other epidemiological parameters) can be reliably estimated from multiple outbreak data, in which some outbreaks display epidemic fade-out and others do not. We generated synthetic outbreak data from independent simulations of stochastic SIRS models in multiple communities. Some outbreaks faded-out and some did not. We conducted Bayesian parameter estimation under two alternative approaches: independently on each outbreak and under a hierarchical framework. When conducting independent estimation, the waning immunity rate was poorly estimated and biased towards zero when an epidemic fade-out was observed. However, under a hierarchical approach, we obtained more accurate and precise posterior estimates for the rate of waning immunity and other epidemiological parameters. The greatest improvement in estimates was obtained for those communities in which epidemic fade-out was observed. Our findings demonstrate the feasibility and value of adopting a Bayesian hierarchical approach for parameter inference for stochastic epidemic models.
Project description:Ellenberg indicator values (EIVs) are a widely used metric in plant ecology comprising a semi-quantitative description of species' ecological requirements. Typically, point estimates of mean EIV scores are compared over space or time to infer differences in the environmental conditions structuring plant communities-particularly in resurvey studies where no historical environmental data are available. However, the use of point estimates as a basis for inference does not take into account variance among species EIVs within sampled plots and gives equal weighting to means calculated from plots with differing numbers of species. Traditional methods are also vulnerable to inaccurate estimates where only incomplete species lists are available.We present a set of multilevel (hierarchical) models-fitted with and without group-level predictors (e.g., habitat type)-to improve precision and accuracy of plot mean EIV scores and to provide more reliable inference on changing environmental conditions over spatial and temporal gradients in resurvey studies. We compare multilevel model performance to GLMMs fitted to point estimates of mean EIVs. We also test the reliability of this method to improve inferences with incomplete species lists in some or all sample plots. Hierarchical modeling led to more accurate and precise estimates of plot-level differences in mean EIV scores between time-periods, particularly for datasets with incomplete records of species occurrence. Furthermore, hierarchical models revealed directional environmental change within ecological habitat types, which less precise estimates from GLMMs of raw mean EIVs were inadequate to detect. The ability to compute separate residual variance and adjusted R 2 parameters for plot mean EIVs and temporal differences in plot mean EIVs in multilevel models also allowed us to uncover a prominent role of hydrological differences as a driver of community compositional change in our case study, which traditional use of EIVs would fail to reveal. Assessing environmental change underlying ecological communities is a vital issue in the face of accelerating anthropogenic change. We have demonstrated that multilevel modeling of EIVs allows for a nuanced estimation of such from plant assemblage data changes at local scales and beyond, leading to a better understanding of temporal dynamics of ecosystems. Further, the ability of these methods to perform well with missing data should increase the total set of historical data which can be used to this end.
Project description:The concept of critical loads is used in the framework of the Convention on Long-range Transboundary Air Pollution (UNECE) to define thresholds below which no damaging effects on habitats occur based on the latest scientific knowledge. Change-point regression models applied in a Bayesian framework are useful statistical tools to estimate critical empirical loads. While hierarchical study designs are common in ecological research, previous methods to estimate critical loads using change-point regression did not allow to analyse data collected under such a design. This method update provides an implementation of hierarchical data structure by including random effects such as study sites or as in this example tree species within the Bayesian approach of change-point regression models using two different approaches. The example data set is an European wide gradient study of the impact of climate change and air pollution on forest tree health assessed by foliar nutrient status of nitrogen (N) to phosphorus (P) from 10 different conifer tree species originated from 88 forest sites and 9 countries covering 22 years (1995-2017). Both modelling approaches using JAGS and Bayesian Regression Models using 'Stan' (brms) resulted in reasonable and similar estimations of the critical empirical load for nitrogen (CLempN) for temperate forests. These methodological examples of using different approaches of Bayesian change-point regression models dealing with random effects could prove useful to infer CLempN for other ecosystems and long-term data sets.•Hierarchical change-point regression models are suitable for estimating critical empirical loads.•The Bayesian framework of these models provides the inclusion of the current critical load and various confounding or modifying variables.•Here we present two ways of implementing hierarchical data sets in Bayesian change-point regression models using JAGS and brms.
Project description:Modern high-throughput biotechnologies such as microarray and next generation sequencing produce a massive amount of information for each sample assayed. However, in a typical high-throughput experiment, only limited amount of data are observed for each individual feature, thus the classical 'large p, small n' problem. Bayesian hierarchical model, capable of borrowing strength across features within the same dataset, has been recognized as an effective tool in analyzing such data. However, the shrinkage effect, the most prominent feature of hierarchical features, can lead to undesirable over-correction for some features. In this work, we discuss possible causes of the over-correction problem and propose several alternative solutions. Our strategy is rooted in the fact that in the Big Data era, large amount of historical data are available which should be taken advantage of. Our strategy presents a new framework to enhance the Bayesian hierarchical model. Through simulation and real data analysis, we demonstrated superior performance of the proposed strategy. Our new strategy also enables borrowing information across different platforms which could be extremely useful with emergence of new technologies and accumulation of data from different platforms in the Big Data era. Our method has been implemented in R package "adaptiveHM", which is freely available from https://github.com/benliemory/adaptiveHM.
Project description:As geographic range estimates for the IUCN Red List guide conservation actions, accuracy and ecological realism are crucial. IUCN's extent of occurrence (EOO) is the general region including the species' range, while area of occupancy (AOO) is the subset of EOO occupied by the species. Data-poor species with incomplete sampling present particular difficulties, but species distribution models (SDMs) can be used to predict suitable areas. Nevertheless, SDMs typically employ abiotic variables (i.e., climate) and do not explicitly account for biotic interactions that can impose range constraints. We sought to improve range estimates for data-poor, parapatric species by masking out areas under inferred competitive exclusion. We did so for two South American spiny pocket mice: Heteromys australis (Least Concern) and Heteromys teleus (Vulnerable due to especially poor sampling), whose ranges appear restricted by competition. For both species, we estimated EOO using SDMs and AOO with four approaches: occupied grid cells, abiotic SDM prediction, and this prediction masked by approximations of the areas occupied by each species' congener. We made the masks using support vector machines (SVMs) fit with two data types: occurrence coordinates alone; and coordinates along with SDM predictions of suitability. Given the uncertainty in calculating AOO for low-data species, we made estimates for the lower and upper bounds for AOO, but only make recommendations for H. teleus as its full known range was considered. The SVM approaches (especially the second one) had lower classification error and made more ecologically realistic delineations of the contact zone. For H. teleus, the lower AOO bound (a strongly biased underestimate) corresponded to Endangered (occupied grid cells), while the upper bounds (other approaches) led to Near Threatened. As we currently lack data to determine the species' true occupancy within the post-processed SDM prediction, we recommend that an updated listing for H. teleus include these bounds for AOO. This study advances methods for estimating the upper bound of AOO and highlights the need for better ways to produce unbiased estimates of lower bounds. More generally, the SVM approaches for post-processing SDM predictions hold promise for improving range estimates for other uses in biogeography and conservation.
Project description:Using satellite-based aerosol optical depth (AOD) measurements and statistical models to estimate ground-level PM2.5 is a promising way to fill the areas that are not covered by ground PM2.5 monitors. The statistical models used in previous studies are primarily Linear Mixed Effects (LME) and Geographically Weighted Regression (GWR) models. In this study, we developed a new regression model between PM2.5 and AOD using Gaussian processes in a Bayesian hierarchical setting. Gaussian processes model the stochastic nature of the spatial random effects, where the mean surface and the covariance function is specified. The spatial stochastic process is incorporated under the Bayesian hierarchical framework to explain the variation of PM2.5 concentrations together with other factors, such as AOD, spatial and non-spatial random effects. We evaluate the results of our model and compare them with those of other, conventional statistical models (GWR and LME) by within-sample model fitting and out-of-sample validation (cross validation, CV). The results show that our model possesses a CV result (R2 = 0.81) that reflects higher accuracy than that of GWR and LME (0.74 and 0.48, respectively). Our results indicate that Gaussian process models have the potential to improve the accuracy of satellite-based PM2.5 estimates.
Project description:Advances in artificial intelligence have inspired a paradigm shift in human neuroscience, yielding large-scale functional magnetic resonance imaging (fMRI) datasets that provide high-resolution brain responses to thousands of naturalistic visual stimuli. Because such experiments necessarily involve brief stimulus durations and few repetitions of each stimulus, achieving sufficient signal-to-noise ratio can be a major challenge. We address this challenge by introducing GLMsingle, a scalable, user-friendly toolbox available in MATLAB and Python that enables accurate estimation of single-trial fMRI responses (glmsingle.org). Requiring only fMRI time-series data and a design matrix as inputs, GLMsingle integrates three techniques for improving the accuracy of trial-wise general linear model (GLM) beta estimates. First, for each voxel, a custom hemodynamic response function (HRF) is identified from a library of candidate functions. Second, cross-validation is used to derive a set of noise regressors from voxels unrelated to the experiment. Third, to improve the stability of beta estimates for closely spaced trials, betas are regularized on a voxel-wise basis using ridge regression. Applying GLMsingle to the Natural Scenes Dataset and BOLD5000, we find that GLMsingle substantially improves the reliability of beta estimates across visually-responsive cortex in all subjects. Comparable improvements in reliability are also observed in a smaller-scale auditory dataset from the StudyForrest experiment. These improvements translate into tangible benefits for higher-level analyses relevant to systems and cognitive neuroscience. We demonstrate that GLMsingle: (i) helps decorrelate response estimates between trials nearby in time; (ii) enhances representational similarity between subjects within and across datasets; and (iii) boosts one-versus-many decoding of visual stimuli. GLMsingle is a publicly available tool that can significantly improve the quality of past, present, and future neuroimaging datasets sampling brain activity across many experimental conditions.
Project description:While incomplete non-medical data has been integrated into prediction models for epidemics, the accuracy and the generalizability of the data are difficult to guarantee. To comprehensively evaluate the ability and applicability of using social media data to predict the development of COVID-19, a new confirmed case prediction algorithm improving the Google Flu Trends algorithm is established, called Weibo COVID-19 Trends (WCT), based on the post dataset generated by all users in Wuhan on Sina Weibo. A genetic algorithm is designed to select the keyword set for filtering COVID-19 related posts. WCT can constantly outperform the highest average test score in the training set between daily new confirmed case counts and the prediction results. It remains to produce the best prediction results among other algorithms when the number of forecast days increases from one to eight days with the highest correlation score from 0.98 (P < 0.01) to 0.86 (P < 0.01) during all analysis period. Additionally, WCT effectively improves the Google Flu Trends algorithm's shortcoming of overestimating the epidemic peak value. This study offers a highly adaptive approach for feature engineering of third-party data in epidemic prediction, providing useful insights for the prediction of newly emerging infectious diseases at an early stage.
Project description:The powerful general Pacala-Hassell host-parasitoid model for a patchy environment, which allows host density-dependent heterogeneity (HDD) to be distinguished from between-patch, host density-independent heterogeneity (HDI), is reformulated within the class of the generalized linear model (GLM) family. This improves accessibility through the provision of general software within well-known statistical systems, and allows a rich variety of models to be formulated. Covariates such as age class, host density and abiotic factors may be included easily. For the case where there is no HDI, the formulation is a simple GLM. When there is HDI in addition to HDD, the formulation is a hierarchical generalized linear model. Two forms of HDI model are considered, both with between-patch variability: one has binomial variation within patches and one has extra-binomial, overdispersed variation within patches. Examples are given demonstrating parameter estimation with standard errors, and hypothesis testing. For one example given, the extra-binomial component of the HDI heterogeneity in parasitism is itself shown to be strongly density dependent.
Project description:Spatial data are playing an increasingly important role in watershed science and management. Large investments have been made by government agencies to provide nationally-available spatial databases; however, their relevance and suitability for local watershed applications is largely unscrutinized. We investigated how goodness of fit and predictive accuracy of total phosphorus (TP) concentration models developed from nationally-available spatial data could be improved by including local watershed-specific data in the East Fork of the Little Miami River, Ohio, a 1290 km2 watershed. We also determined whether a spatial stream network (SSN) modeling approach improved on multiple linear regression (nonspatial) models. Goodness of fit and predictive accuracy were highest for the SSN model that included local covariates, and lowest for the nonspatial model developed from national data. Septic systems and point source TP loads were significant covariates in the local models. These local data not only improved the models but enabled a more explicit interpretation of the processes affecting TP concentrations than more generic national covariates. The results suggest that SSN modeling greatly improves prediction and should be applied when using national covariates. Including local covariates further increases the accuracy of TP predictions throughout the studied watershed; such variables should be included in future national databases, particularly the locations of septic systems.