Generating Virtual Patients by Multivariate and Discrete Re-Sampling Techniques.
ABSTRACT: Clinical Trial Simulations (CTS) are a valuable tool for decision-making during drug development. However, to obtain realistic simulation scenarios, the patients included in the CTS must be representative of the target population. This is particularly important when covariate effects exist that may affect the outcome of a trial. The objective of our investigation was to evaluate and compare CTS results using re-sampling from a population pool and multivariate distributions to simulate patient covariates.COPD was selected as paradigm disease for the purposes of our analysis, FEV1 was used as response measure and the effects of a hypothetical intervention were evaluated in different populations in order to assess the predictive performance of the two methods.Our results show that the multivariate distribution method produces realistic covariate correlations, comparable to the real population. Moreover, it allows simulation of patient characteristics beyond the limits of inclusion and exclusion criteria in historical protocols.Both methods, discrete resampling and multivariate distribution generate realistic pools of virtual patients. However the use of a multivariate distribution enable more flexible simulation scenarios since it is not necessarily bound to the existing covariate combinations in the available clinical data sets.
Project description:Many complex brain disorders, such as autism spectrum disorders, exhibit a wide range of symptoms and disability. To understand how brain communication is impaired in such conditions, functional connectivity studies seek to understand individual differences in brain network structure in terms of covariates that measure symptom severity. In practice, however, functional connectivity is not observed but estimated from complex and noisy neural activity measurements. Imperfect subject network estimates can compromise subsequent efforts to detect covariate effects on network structure. We address this problem in the case of Gaussian graphical models of functional connectivity, by proposing novel two-level models that treat both subject level networks and population level covariate effects as unknown parameters. To account for imperfectly estimated subject level networks when fitting these models, we propose two related approaches-R (2) based on resampling and random effects test statistics, and R (3) that additionally employs random adaptive penalization. Simulation studies using realistic graph structures reveal that R (2) and R (3) have superior statistical power to detect covariate effects compared to existing approaches, particularly when the number of within subject observations is comparable to the size of subject networks. Using our novel models and methods to study parts of the ABIDE dataset, we find evidence of hypoconnectivity associated with symptom severity in autism spectrum disorders, in frontoparietal and limbic systems as well as in anterior and posterior cingulate cortices.
Project description:Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method-ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.
Project description:Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.
Project description:OBJECTIVES:The goal of this study was to examine the relationship between measured teamwork and adverse safety events in the prehospital emergency care of children using high-fidelity simulation. We posit that non-technical skills such as leadership, teamwork, situation awareness and decision-making are associated with the clinical success of teams. DESIGN:Observational study. SETTING:Emergency medical services (EMS) responders were recruited from public fire and private transport agencies in Oregon State to participate in four simulations of paediatric emergencies using high-fidelity patient simulators, scene design, and professional actors playing parents and bystanders. PARTICIPANTS:Forty-four fire/transport teams consisting of 259 EMS professionals consented to participate and completed simulations. PRIMARY AND SECONDARY OUTCOME MEASURES:Teams were assessed using the Clinical Teamwork Scale (CTS), a validated instrument that measures overall teamwork and 15 specific elements in five overarching domains: communication, decision-making, role responsibility (leadership and followership), situational awareness/resource management and patient-friendliness. We used generalised estimating equations to estimate the odds of error with increasing overall CTS teamwork score while adjusting for clinical scenario and potential clustering by team. RESULTS:Across 176 simulations, the mean overall score on the CTS was 6.04 (SD 2.10; range 1=poor?to 10=perfect) and was normally distributed. The distribution of scores was similar across the four clinical scenarios. At least one error was observed in 82% of the simulations. In simulations with at least one observed error, the mean CTS score was 5.76 (SD 2.04) compared with 7.16 (SD 1.95) in scenarios with no observed error. Logistic regression analysis accounting for clustering at the team level revealed that the odds of an error decreased 28% with each unit increase in CTS (OR 0.72, 95% CI 0.59 to 0.88). CONCLUSIONS:This study found that overall teamwork among care delivery teams was strongly associated with the risk of serious adverse events in simulated scenarios of caring for critically ill and injured children.
Project description:Under covariate adaptive randomization, the covariate is tied to both randomization and analysis. Misclassification of such covariate will impact the intended treatment assignment; further, it is unclear what the appropriate analysis strategy should be. We explore the impact of such misclassification on the trial's statistical operating characteristics. Simulation scenarios were created based on the misclassification rate and the covariate effect on the outcome. Models including unadjusted, adjusted for the misclassified, or adjusted for the corrected covariate were compared using logistic regression for a binary outcome and Poisson regression for a count outcome. For the binary outcome using logistic regression, type I error can be maintained in the adjusted model, but the test is conservative using an unadjusted model. Power decreased with both increasing covariate effect on the outcome as well as the misclassification rate. Treatment effect estimates were biased towards the null for both the misclassified and unadjusted models. For the count outcome using a Poisson model, covariate misclassification led to inflated type I error probabilities and reduced power in the misclassified and the unadjusted model. The impact of covariate misclassification under covariate-adaptive randomization differs depending on the underlying distribution of the outcome.
Project description:In the past decade, many genome-wide association studies (GWASs) have been conducted to explore association of single nucleotide polymorphisms (SNPs) with complex diseases using a case-control design. These GWASs not only collect information on the disease status (primary phenotype, D) and the SNPs (genotypes, X), but also collect extensive data on several risk factors and traits. Recent literature and grant proposals point toward a trend in reusing existing large case-control data for exploring genetic associations of some additional traits (secondary phenotypes, Y) collected during the study. These secondary phenotypes may be correlated, and a proper analysis warrants a multivariate approach. Commonly used multivariate methods are not equipped to properly account for the non-random sampling scheme. Current ad hoc practices include analyses without any adjustment, and analyses with D adjusted as a covariate. Our theoretical and empirical studies suggest that the type I error for testing genetic association of secondary traits can be substantial when X as well as Y are associated with D, even when there is no association between X and Y in the underlying (target) population. Whether using D as a covariate helps maintain type I error depends heavily on the disease mechanism and the underlying causal structure (which is often unknown). To avoid grossly incorrect inference, we have proposed proportional odds model adjusted for propensity score (POM-PS). It uses a proportional odds logistic regression of X on Y and adjusts estimated conditional probability of being diseased as a covariate. We demonstrate the validity and advantage of POM-PS, and compare to some existing methods in extensive simulation experiments mimicking plausible scenarios of dependency among Y, X, and D. Finally, we use POM-PS to jointly analyze four adiposity traits using a type 2 diabetes (T2D) case-control sample from the population-based Metabolic Syndrome in Men (METSIM) study. Only POM-PS analysis of the T2D case-control sample seems to provide valid association signals.
Project description:Optimal dose selection in clinical trials is problematic when efficacious and toxic concentrations are close. A novel quantitative approach follows for optimizing dose titration in clinical trials. A system of pharmacokinetics (PK), pharmacodynamics, efficacy, and toxicity was simulated for scenarios characterized by varying degrees of different types of variability. Receiver operating characteristic (ROC) and clinical trial simulation (CTS) were used to optimize drug titration by maximizing efficacy/safety. The scenarios included were a low-variability base scenario, and high residual (20%), interoccasion (20%), interindividual (40%), and residual plus interindividual variability scenarios, and finally a shallow toxicity slope scenario. The percentage of subjects having toxicity was reduced by 87.4% to 93.5%, and those having efficacy was increased by 52.7% to 243%. Interindividual PK variability may have less impact on optimal cutoff values than other sources of variability. ROC/CTS methods for optimizing dose titration offer an individualized approach that leverages exposure-response relationships.
Project description:BACKGROUND:Differential expression analysis on tumor expression profiles has always been a key issue for subsequent biological experimental validation. It is important how to select features which best discriminate between different groups of patients. Despite the emergence of multivariate analysis approaches, prevailing feature selection methods primarily focus on multiple hypothesis testing on individual variables, and then combine them for an explanatory result. Besides, these methods, which are commonly based on hypothesis testing, view classification as a posterior validation of the selected variables. RESULTS:Based on previously provided A5 feature selection strategy, we develop a joint covariate detection tool for differential expression analysis on tumor expression profiles. This software combines hypothesis testing with testing according to classification results. A model selection approach based on Gaussian mixture model is introduced in for automatic selection of features. Besides, a projection heatmap is proposed for the first time. CONCLUSIONS:Joint covariate detection strengthens the viewpoint for selecting variables which are not only individually but also jointly significant. Experiments on simulation and realistic data show the effectiveness of the developed software, which enhances the reliability of joint covariate detection for differential expression analysis on tumor expression profiles. The software is available at http://bio-nefu.com/resource/jcd-dea .
Project description:Genomic selection (GS) has been widely considered as a valuable strategy for enhancing the rate of genetic gain in farm animals. However, the construction of a large reference population is a big challenge for small populations like indigenous cattle. In order to evaluate the potential application of GS for Chinese indigenous cattle, we assessed the influence of combining multiple populations on the reliability of genomic predictions for 10 indigenous breeds of Chinese cattle using simulated data. Also, we examined the effect of different genetic architecture on prediction accuracy. In this study, we simulated a set of genotype data by a resampling approach which can reflect the realistic linkage disequilibrium pattern for multiple populations. We found within-breed evaluations yielded the highest accuracies ranged from 0.64 to 0.68 for four different simulated genetic architectures. For scenarios using multiple breeds as reference, the predictive accuracies were higher when the reference was comprised of breeds with a close relationship, while the accuracies were low when prediction were carried out among breeds. In addition, the accuracy increased in all scenarios with the heritability increased. Our results suggested that using meta-population as reference can increase accuracy of genomic predictions for small populations. Moreover, multi-breed genomic selection was feasible for Chinese indigenous populations with genetic relationships.