Project description:We propose a penalized spline approach to performing large numbers of parallel non-parametric analyses of either of two types: restricted likelihood ratio tests of a parametric regression model versus a general smooth alternative, and nonparametric regression. Compared with naïvely performing each analysis in turn, our techniques reduce computation time dramatically. Viewing the large collection of scatterplot smooths produced by our methods as functional data, we develop a clustering approach to summarize and visualize these results. Our approach is applicable to ultra-high-dimensional data, particularly data acquired by neuroimaging; we illustrate it with an analysis of developmental trajectories of functional connectivity at each of approximately 70000 brain locations. Supplementary materials, including an appendix and an R package, are available online.
Project description:We present a predictive account on adaptive sequential sampling of stimulus-response relations in psychophysical experiments. Our discussion applies to experimental situations with ordinal stimuli when there is only weak structural knowledge available such that parametric modeling is no option. By introducing a certain form of partial exchangeability, we successively develop a hierarchical Bayesian model based on a mixture of Pólya urn processes. Suitable utility measures permit us to optimize the overall experimental sampling process. We provide several measures that are either based on simple count statistics or more elaborate information theoretic quantities. The actual computation of information theoretic utilities often turns out to be infeasible. This is not the case with our sampling method, which relies on an efficient algorithm to compute exact solutions of our posterior predictions and utility measures. Finally, we demonstrate the advantages of our framework on a hypothetical sampling problem.
Project description:In many biological systems, the functional behavior of a group is collectively computed by the system's individual components. An example is the brain's ability to make decisions via the activity of billions of neurons. A long-standing puzzle is how the components' decisions combine to produce beneficial group-level outputs, despite conflicts of interest and imperfect information. We derive a theoretical model of collective computation from mechanistic first principles, using results from previous work on the computation of power structure in a primate model system. Collective computation has two phases: an information accumulation phase, in which (in this study) pairs of individuals gather information about their fighting abilities and make decisions about their dominance relationships, and an information aggregation phase, in which these decisions are combined to produce a collective computation. To model information accumulation, we extend a stochastic decision-making model-the leaky integrator model used to study neural decision-making-to a multiagent game-theoretic framework. We then test alternative algorithms for aggregating information-in this study, decisions about dominance resulting from the stochastic model-and measure the mutual information between the resultant power structure and the "true" fighting abilities. We find that conflicts of interest can improve accuracy to the benefit of all agents. We also find that the computation can be tuned to produce different power structures by changing the cost of waiting for a decision. The successful application of a similar stochastic decision-making model in neural and social contexts suggests general principles of collective computation across substrates and scales.
Project description:BACKGROUND:Modeling thousands of markers simultaneously has been of great interest in testing association between genetic biomarkers and disease or disease-related quantitative traits. Recently, an expectation-maximization (EM) approach to Bayesian variable selection (EMVS) facilitating the Bayesian computation was developed for continuous or binary outcome using a fast EM algorithm. However, it is not suitable to the analyses of time-to-event outcome in many public databases such as The Cancer Genome Atlas (TCGA). RESULTS:We extended the EMVS to high-dimensional parametric survival regression framework (SurvEMVS). A variant of cyclic coordinate descent (CCD) algorithm was used for efficient iteration in M-step, and the extended Bayesian information criteria (EBIC) was employed to make choice on hyperparameter tuning. We evaluated the performance of SurvEMVS using numeric simulations and illustrated the effectiveness on two real datasets. The results of numerical simulations and two real data analyses show the well performance of SurvEMVS in aspects of accuracy and computation. Some potential markers associated with survival of lung or stomach cancer were identified. CONCLUSIONS:These results suggest that our model is effective and can cope with high-dimensional omics data.
Project description:Expected value of information methods evaluate the potential health benefits that can be obtained from conducting new research to reduce uncertainty in the parameters of a cost-effectiveness analysis model, hence reducing decision uncertainty. Expected value of partial perfect information (EVPPI) provides an upper limit to the health gains that can be obtained from conducting a new study on a subset of parameters in the cost-effectiveness analysis and can therefore be used as a sensitivity analysis to identify parameters that most contribute to decision uncertainty and to help guide decisions around which types of study are of most value to prioritize for funding. A common general approach is to use nested Monte Carlo simulation to obtain an estimate of EVPPI. This approach is computationally intensive, can lead to significant sampling bias if an inadequate number of inner samples are obtained, and incorrect results can be obtained if correlations between parameters are not dealt with appropriately. In this article, we set out a range of methods for estimating EVPPI that avoid the need for nested simulation: reparameterization of the net benefit function, Taylor series approximations, and restricted cubic spline estimation of conditional expectations. For each method, we set out the generalized functional form that net benefit must take for the method to be valid. By specifying this functional form, our methods are able to focus on components of the model in which approximation is required, avoiding the complexities involved in developing statistical approximations for the model as a whole. Our methods also allow for any correlations that might exist between model parameters. We illustrate the methods using an example of fluid resuscitation in African children with severe malaria.
Project description:Joint modeling of survival and longitudinal data has been studied extensively in the recent literature. The likelihood approach is one of the most popular estimation methods employed within the joint modeling framework. Typically, the parameters are estimated using maximum likelihood, with computation performed by the expectation maximization (EM) algorithm. However, one drawback of this approach is that standard error (SE) estimates are not automatically produced when using the EM algorithm. Many different procedures have been proposed to obtain the asymptotic covariance matrix for the parameters when the number of parameters is typically small. In the joint modeling context, however, there may be an infinite-dimensional parameter, the baseline hazard function, which greatly complicates the problem, so that the existing methods cannot be readily applied. The profile likelihood and the bootstrap methods overcome the difficulty to some extent; however, they can be computationally intensive. In this paper, we propose two new methods for SE estimation using the EM algorithm that allow for more efficient computation of the SE of a subset of parametric components in a semiparametric or high-dimensional parametric model. The precision and computation time are evaluated through a thorough simulation study. We conclude with an application of our SE estimation method to analyze an HIV clinical trial dataset.
Project description:BACKGROUND:The expected value of sample information (EVSI) can help prioritize research but its application is hampered by computational infeasibility, especially for complex models. We investigated an approach by Strong and colleagues to estimate EVSI by applying generalized additive models (GAM) to results generated from a probabilistic sensitivity analysis (PSA). METHODS:For 3 potential HIV prevention and treatment strategies, we estimated life expectancy and lifetime costs using the Cost-effectiveness of Preventing AIDS Complications (CEPAC) model, a complex patient-level microsimulation model of HIV progression. We fitted a GAM-a flexible regression model that estimates the functional form as part of the model fitting process-to the incremental net monetary benefits obtained from the CEPAC PSA. For each case study, we calculated the expected value of partial perfect information (EVPPI) using both the conventional nested Monte Carlo approach and the GAM approach. EVSI was calculated using the GAM approach. RESULTS:For all 3 case studies, the GAM approach consistently gave similar estimates of EVPPI compared with the conventional approach. The EVSI behaved as expected: it increased and converged to EVPPI for larger sample sizes. For each case study, generating the PSA results for the GAM approach required 3 to 4 days on a shared cluster, after which EVPPI and EVSI across a range of sample sizes were evaluated in minutes. The conventional approach required approximately 5 weeks for the EVPPI calculation alone. CONCLUSION:Estimating EVSI using the GAM approach with results from a PSA dramatically reduced the time required to conduct a computationally intense project, which would otherwise have been impractical. Using the GAM approach, we can efficiently provide policy makers with EVSI estimates, even for complex patient-level microsimulation models.
Project description:The allocation of healthcare resources among competing priorities requires an assessment of the expected costs and health effects of investing resources in the activities and of the opportunity cost of the expenditure. To date, much effort has been devoted to assessing the expected costs and health effects, but there remains an important need to also reflect the consequences of uncertainty in resource allocation decisions and the value of further research to reduce uncertainty. Decision making with uncertainty may turn out to be suboptimal, resulting in health loss. Consequently, there may be value in reducing uncertainty, through the collection of new evidence, to better inform resource decisions. This value can be quantified using value of information (VOI) analysis. This report from the ISPOR VOI Task Force describes methods for computing 4 VOI measures: the expected value of perfect information, expected value of partial perfect information (EVPPI), expected value of sample information (EVSI), and expected net benefit of sampling (ENBS). Several methods exist for computing EVPPI and EVSI, and this report provides guidance on selecting the most appropriate method based on the features of the decision problem. The report provides a number of recommendations for good practice when planning, undertaking, or reviewing VOI analyses. The software needed to compute VOI is discussed, and areas for future research are highlighted.
Project description:BACKGROUND:Feature selection or scoring methods for the detection of biomarkers are essential in bioinformatics. Various feature selection methods have been developed for the detection of biomarkers, and several studies have employed information-theoretic approaches. However, most of these methods generally require a long processing time. In addition, information-theoretic methods discretize continuous features, which is a drawback that can lead to the loss of information. RESULTS:In this paper, a novel supervised feature scoring method named ClearF is proposed. The proposed method is suitable for continuous-valued data, which is similar to the principle of feature selection using mutual information, with the added advantage of a reduced computation time. The proposed score calculation is motivated by the association between the reconstruction error and the information-theoretic measurement. Our method is based on class-wise low-dimensional embedding and the resulting reconstruction error. Given multi-class datasets such as a case-control study dataset, low-dimensional embedding is first applied to each class to obtain a compressed representation of the class, and also for the entire dataset. Reconstruction is then performed to calculate the error of each feature and the final score for each feature is defined in terms of the reconstruction errors. The correlation between the information theoretic measurement and the proposed method is demonstrated using a simulation. For performance validation, we compared the classification performance of the proposed method with those of various algorithms on benchmark datasets. CONCLUSIONS:The proposed method showed higher accuracy and lower execution time than the other established methods. Moreover, an experiment was conducted on the TCGA breast cancer dataset, and it was confirmed that the genes with the highest scores were highly associated with subtypes of breast cancer.
Project description:BACKGROUND: The desire of patients for personal continuity of care with a General Practitioner (GP) has been well documented, but not within non-registered private patients in Ireland. This study set out to examine the attitudes and reported behaviours of private fee-paying patients towards continuity of GP care and universal registration for patients. METHODS: Cross-sectional telephone survey of 400 randomly chosen fee-paying patients living within County Dublin. There is no formal system of registration with a GP for these patients. Main outcomes were attendance of respondents at primary health care facilities and their attitudes towards continuity of care and registration with a GP. Data was analysed using descriptive statistics and using parametric and non-parametric tests of association. Pearson correlation was used to quantify the association between the described variables and attitudes towards continuity and registration with a GP. Variables showing significance at the 5% level were entered into multiple linear regression models. RESULTS: 97% of respondents had seen a GP in the previous 5 years. The mean number of visits to the GP for respondents was 2.3 per annum. 89% of respondents had a regular GP and the mean length of time with their GP was 15.6 years. 96% preferred their personal medical care to be provided within one general practice. 16% of respondents had consulted a GP outside of their own practice in the previous year. They were more likely to be female, commute a longer distance to work or have poorer health status. 81% considered it important to be officially registered with a GP practice of their choice. CONCLUSION: Both personal and longitudinal continuity of care with a GP are important to private patients. Respondents who chose to visit GPs other than their regular GP were not easily characterised in this study and individual circumstances may lead to this behaviour. There is strong support for a system of universal patient registration within general practice.