Project description:For the application of toxicokinetic-toxicodynamic (TKTD) models in the European environmental risk assessment (ERA) of plant protection products, it is recommended to evaluate model predictions of the calibration as well as the independent validation data set based on qualitative criteria (visual assessment) and quantitative goodness-of-fit (GoF) metrics. The aims of this study were to identify whether quantitative criteria coincide with human visual perception of model performance and which evaluator characteristics influence their perception. In an anonymous online survey, > 70 calibration and validation general unified threshold models of survival (GUTS) fits were ranked by 64 volunteers with a professional interest in ecotoxicology and TKTD modeling. Participants were asked to score model fits to the time resolved survival data from toxicity experiments and to an aggregated dose-response curve representation. Dose-response curve plots tended to be scored better than time series, although both representations were based on the same toxicity test data and model results. For the time series, quantitative indices and visual assessments generally agreed on model performance. However, rankings varied with individual perceptions of the participants. Visual assessment scores were best predicted using a combination of GoF metrics. From the survey participants' majority agreement on fit acceptance, GoF cut-off criteria could be derived that indicated sufficient fit performance. The most conservative GoF criterion well resembled current suggestions by the European Food Safety Authority. Hence, the survey results provide evidence that current quantitative GUTS assessment practice in ERA is consistent with perceptions of fit quality based on visual judgements of the dynamic model behavior by a large number of practitioners. Thus, our study fosters trust in model performance assessment.
Project description:Meta-analysis is a very useful tool to combine information from different sources. Fixed effect and random effect models are widely used in meta-analysis. Despite their popularity, they may give us misleading results if the models don't fit the data but are blindly used. Therefore, like any statistical analysis, checking the model fitting is an important step. However, in practice, the goodness-of-fit in meta-analysis is rarely discussed. In this paper, we propose some tests to check the goodness-of-fit for the fixed and random effect models with assumption of normal distributions in meta-analysis. Through simulation study, we show that the proposed tests control type I error rate very well. To demonstrate the usefulness of the proposed tests, we also apply them to some real data sets. Our study shows that the proposed tests are useful tools in checking the goodness-of-fit of the normal models used in meta-analysis.
Project description:This article proposes methodology for assessing goodness of fit in Bayesian hierarchical models. The methodology is based on comparing values of pivotal discrepancy measures (PDMs), computed using parameter values drawn from the posterior distribution, to known reference distributions. Because the resulting diagnostics can be calculated from standard output of Markov chain Monte Carlo algorithms, their computational costs are minimal. Several simulation studies are provided, each of which suggests that diagnostics based on PDMs have higher statistical power than comparable posterior-predictive diagnostic checks in detecting model departures. The proposed methodology is illustrated in a clinical application; an application to discrete data is described in supplementary material.
Project description:Goodness of fit tests for two probabilistic multigraph models are presented. The first model is random stub matching given fixed degrees (RSM) so that edge assignments to vertex pair sites are dependent, and the second is independent edge assignments (IEA) according to a common probability distribution. Tests are performed using goodness of fit measures between the edge multiplicity sequence of an observed multigraph, and the expected one according to a simple or composite hypothesis. Test statistics of Pearson type and of likelihood ratio type are used, and the expected values of the Pearson statistic under the different models are derived. Test performances based on simulations indicate that even for small number of edges, the null distributions of both statistics are well approximated by their asymptotic χ2-distribution. The non-null distributions of the test statistics can be well approximated by proposed adjusted χ2-distributions used for power approximations. The influence of RSM on both test statistics is substantial for small number of edges and implies a shift of their distributions towards smaller values compared to what holds true for the null distributions under IEA. Two applications on social networks are included to illustrate how the tests can guide in the analysis of social structure.
Project description:Pearson's chi-squared test is widely used to test the goodness of fit between categorical data and a given discrete distribution function. When the number of sets of the categorical data, say k, is a fixed integer, Pearson's chi-squared test statistic converges in distribution to a chi-squared distribution with k-1 degrees of freedom when the sample size n goes to infinity. In real applications, the number k often changes with n and may be even much larger than n. By using the martingale techniques, we prove that Pearson's chi-squared test statistic converges to the normal under quite general conditions. We also propose a new test statistic which is more powerful than chi-squared test statistic based on our simulation study. A real application to lottery data is provided to illustrate our methodology.
Project description:Linear mixed models (LMMs) are widely used for regression analysis of data that are assumed to be clustered or correlated. Assessing model fit is important for valid inference but to date no confirmatory tests are available to assess the adequacy of the fixed effects part of LMMs against general alternatives. We therefore propose a class of goodness-of-fit tests for the mean structure of LMMs. Our test statistic is a quadratic form of the difference between observed values and the values expected under the estimated model in cells defined by a partition of the covariate space. We show that this test statistic has an asymptotic chi-squared distribution when model parameters are estimated by maximum likelihood or by least squares and method of moments, and study its power under local alternatives both analytically and in simulations. Data on repeated measurements of thyroglobulin from individuals exposed to the accident at the Chernobyl power plant in 1986 are used to illustrate the proposed test.
Project description:We propose Lp distance-based goodness-of-fit (GOF) tests for uniform stochastic ordering with two continuous distributions F and G, both of which are unknown. Our tests are motivated by the fact that when F and G are uniformly stochastically ordered, the ordinal dominance curve R = FG-1 is star-shaped. We derive asymptotic distributions and prove that our testing procedure has a unique least favorable configuration of F and G for p ∈ [1,∞]. We use simulation to assess finite-sample performance and demonstrate that a modified, one-sample version of our procedure (e.g., with G known) is more powerful than the one-sample GOF test suggested by Arcones and Samaniego (2000, Annals of Statistics). We also discuss sample size determination. We illustrate our methods using data from a pharmacology study evaluating the effects of administering caffeine to prematurely born infants.
Project description:The two key issues of modern Bayesian statistics are: (i) establishing principled approach for distilling statistical prior that is consistent with the given data from an initial believable scientific prior; and (ii) development of a consolidated Bayes-frequentist data analysis workflow that is more effective than either of the two separately. In this paper, we propose the idea of "Bayes via goodness-of-fit" as a framework for exploring these fundamental questions, in a way that is general enough to embrace almost all of the familiar probability models. Several examples, spanning application areas such as clinical trials, metrology, insurance, medicine, and ecology show the unique benefit of this new point of view as a practical data science tool.
Project description:BackgroundThe similarity or distance measure used for clustering can generate intuitive and interpretable clusters when it is tailored to the unique characteristics of the data. In time series datasets generated with high-throughput biological assays, measurements such as gene expression levels or protein phosphorylation intensities are collected sequentially over time, and the similarity score should capture this special temporal structure.ResultsWe propose a clustering similarity measure called Lag Penalized Weighted Correlation (LPWC) to group pairs of time series that exhibit closely-related behaviors over time, even if the timing is not perfectly synchronized. LPWC aligns time series profiles to identify common temporal patterns. It down-weights aligned profiles based on the length of the temporal lags that are introduced. We demonstrate the advantages of LPWC versus existing time series and general clustering algorithms. In a simulated dataset based on the biologically-motivated impulse model, LPWC is the only method to recover the true clusters for almost all simulated genes. LPWC also identifies clusters with distinct temporal patterns in our yeast osmotic stress response and axolotl limb regeneration case studies.ConclusionsLPWC achieves both of its time series clustering goals. It groups time series with correlated changes over time, even if those patterns occur earlier or later in some of the time series. In addition, it refrains from introducing large shifts in time when searching for temporal patterns by applying a lag penalty. The LPWC R package is available at https://github.com/gitter-lab/LPWC and CRAN under a MIT license.