Middle censoring in the multinomial distribution with applications.
ABSTRACT: In a multinomial set-up with k possible outcomes, we develop estimation under a "middle censoring" paradigm, which is as defined in Jammalamadaka and Mangalam (2003). This problem has many special features because of the inter-dependent probabilities, which we explore here.
Project description:This paper proposes a novel paradigm for building regression trees and ensemble learning in survival analysis. Generalizations of the CART and Random Forests algorithms for general loss functions, and in the latter case more general bootstrap procedures, are both introduced. These results, in combination with an extension of the theory of censoring unbiased transformations applicable to loss functions, underpin the development of two new classes of algorithms for constructing survival trees and survival forests: Censoring Unbiased Regression Trees and Censoring Unbiased Regression Ensembles. For a certain "doubly robust" censoring unbiased transformation of squared error loss, we further show how these new algorithms can be implemented using existing software (e.g., CART, random forests). Comparisons of these methods to existing ensemble procedures for predicting survival probabilities are provided in both simulated settings and through applications to four datasets. It is shown that these new methods either improve upon, or remain competitive with, existing implementations of random survival forests, conditional inference forests, and recursively imputed survival trees.
Project description:Researchers often encounter longitudinal health data characterized with three or more ordinal or nominal categories. Random-effects multinomial logit models are generally applied to account for potential lack of independence inherent in such clustered data. When parameter estimates are used to describe longitudinal processes, however, random effects, both between and within individuals, need to be retransformed for correctly predicting outcome probabilities. This study attempts to go beyond existing work by developing a retransformation method that derives longitudinal growth trajectories of unbiased health probabilities. We estimated variances of the predicted probabilities by using the delta method. Additionally, we transformed the covariates' regression coefficients on the multinomial logit function, not substantively meaningful, to the conditional effects on the predicted probabilities. The empirical illustration uses the longitudinal data from the Asset and Health Dynamics among the Oldest Old. Our analysis compared three sets of the predicted probabilities of three health states at six time points, obtained from, respectively, the retransformation method, the best linear unbiased prediction, and the fixed-effects approach. The results demonstrate that neglect of retransforming random errors in the random-effects multinomial logit model results in severely biased longitudinal trajectories of health probabilities as well as overestimated effects of covariates on the probabilities.
Project description:Overdispersion models have been extensively studied for correlated normal and binomial data but much less so for correlated multinomial data. In this work, we describe a multinomial overdispersion model that leads to the specification of the first two moments of the outcome and allows the estimation of the global parameters using generalized estimating equations (GEE). We introduce a Global Blinding Index as a target parameter and illustrate the application of the GEE method to its estimation from (1) a clinical trial with clustering by practitioner and (2) a meta-analysis on psychiatric disorders. We examine the impact of a small number of clusters, high variability in cluster sizes, and the magnitude of the intraclass correlation on the performance of the GEE estimators of the Global Blinding Index using the data simulated from different models. We compare these estimators with the inverse-variance weighted estimators and a maximum-likelihood estimator, derived under the Dirichlet-multinomial model. Our results indicate that the performance of the GEE estimators was satisfactory even in situations with a small number of clusters, whereas the inverse-variance weighted estimators performed poorly, especially for larger values of the intraclass correlation coefficient. Our findings and illustrations may be instrumental for practitioners who analyze clustered multinomial data from clinical trials and/or meta-analysis.
Project description:We propose a beta product confidence procedure (BPCP) that is a non-parametric confidence procedure for the survival curve at a fixed time for right-censored data assuming independent censoring. In such situations, the Kaplan-Meier estimator is typically used with an asymptotic confidence interval (CI) that can have coverage problems when the number of observed failures is not large, and/or when testing the latter parts of the curve where there are few remaining subjects at risk. The BPCP guarantees central coverage (i.e. ensures that both one-sided error rates are no more than half of the total nominal rate) when there is no censoring (in which case it reduces to the Clopper-Pearson interval) or when there is progressive type II censoring (i.e. when censoring only occurs immediately after failures on fixed proportions of the remaining individuals). For general independent censoring, simulations show that the BPCP maintains central coverage in many situations where competing methods can have very substantial error rate inflation for the lower limit. The BPCP gives asymptotically correct coverage and is asymptotically equivalent to the CI on the Kaplan-Meier estimator using Greenwood's variance. The BPCP may be inverted to create confidence procedures for a quantile of the underlying survival distribution. Because the BPCP is easy to implement, offers protection in settings when other methods fail, and essentially matches other methods when they succeed, it should be the method of choice.
Project description:The focus of the research is on the analysis of genome sequences. Based on the inter-nucleotide distance sequence, we propose the conditional multinomial distribution profile for the complete genomic sequence. These profiles can be used to define a very simple, computationally efficient, alignment-free, distance measure that reflects the evolutionary relationships between genomic sequences. We use this distance measure to classify chromosomes according to species of origin, to build the phylogenetic tree of 24 complete genome sequences of coronaviruses. Our results demonstrate the new method is powerful and efficient.
Project description:In this article, an attribute control chart has been proposed using the accelerated hybrid censoring logic for the monitoring of defective items whose life follows a Weibull distribution. The product can be tested by introducing the acceleration factor based on different pressurized conditions such as stress, load, strain, temperature, etc. The control limits are derived based on the binomial distribution, but the fraction defective is expressed only through the shape parameter, the acceleration factor and the test duration constant. Tables of the average run lengths have been generated for different process parameters to assess the performance of the proposed control chart. Simulation studies have been performed for the practical use, where the proposed chart is compared with the Shewhart np chart for demonstration of the detection power of a process shift.
Project description:In this work, we study quantile regression when the response is an event time subject to potentially dependent censoring. We consider the semi-competing risks setting, where time to censoring remains observable after the occurrence of the event of interest. While such a scenario frequently arises in biomedical studies, most of current quantile regression methods for censored data are not applicable because they generally require the censoring time and the event time be independent. By imposing rather mild assumptions on the association structure between the time-to-event response and the censoring time variable, we propose quantile regression procedures, which allow us to garner a comprehensive view of the covariate effects on the event time outcome as well as to examine the informativeness of censoring. An efficient and stable algorithm is provided for implementing the new method. We establish the asymptotic properties of the resulting estimators including uniform consistency and weak convergence. The theoretical development may serve as a useful template for addressing estimating settings that involve stochastic integrals. Extensive simulation studies suggest that the proposed method performs well with moderate sample sizes. We illustrate the practical utility of our proposals through an application to a bone marrow transplant trial.
Project description:The analysis of time-to-event data can be complicated by competing risks, which are events that alter the probability of, or completely preclude the occurrence of an event of interest. This is distinct from censoring, which merely prevents us from observing the time at which the event of interest occurs. However, the censoring distribution plays a vital role in the proportional subdistribution hazards model, a commonly used method for regression analysis of time-to-event data in the presence of competing risks.We present the equations that underlie the proportional subdistribution hazards model to highlight the way in which the censoring distribution is included in its estimation via risk set weights. By simulating competing risk data under a proportional subdistribution hazards model with different patterns of censoring, we examine the properties of the estimates from such a model when the censoring distribution is misspecified. We use an example from stem cell transplantation in multiple myeloma to illustrate the issue in real data.Models that correctly specified the censoring distribution performed better than those that did not, giving lower bias and variance in the estimate of the subdistribution hazard ratio. In particular, when the covariate of interest does not affect the censoring distribution but is used in calculating risk set weights, estimates from the model based on these weights may not reflect the correct likelihood structure and therefore may have suboptimal performance.The estimation of the censoring distribution can affect the accuracy and conclusions of a competing risks analysis, so it is important that this issue is considered carefully when analysing time-to-event data in the presence of competing risks.
Project description:As ordinary citizens increasingly moderate online forums, blogs, and their own social media feeds, a new type of censoring has emerged wherein people selectively remove opposing political viewpoints from online contexts. In three studies of behavior on putative online forums, supporters of a political cause (e.g., abortion or gun rights) preferentially censored comments that opposed their cause. The tendency to selectively censor cause-incongruent online content was amplified among people whose cause-related beliefs were deeply rooted in or "fused with" their identities. Moreover, six additional identity-related measures also amplified the selective censoring effect. Finally, selective censoring emerged even when opposing comments were inoffensive and courteous. We suggest that because online censorship enacted by moderators can skew online content consumed by millions of users, it can systematically disrupt democratic dialogue and subvert social harmony.
Project description:Multinomial processing trees (MPTs) are a popular class of cognitive models for categorical data. Typically, researchers compare several MPTs, each equipped with many parameters, especially when the models are implemented in a hierarchical framework. A Bayesian solution is to compute posterior model probabilities and Bayes factors. Both quantities, however, rely on the marginal likelihood, a high-dimensional integral that cannot be evaluated analytically. In this case study, we show how Warp-III bridge sampling can be used to compute the marginal likelihood for hierarchical MPTs. We illustrate the procedure with two published data sets and demonstrate how Warp-III facilitates Bayesian model averaging.