A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications.
ABSTRACT: Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log-log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata).
Project description:Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social science and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this article, we consider multilevel latent class models, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the expectation-maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less-efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the obsessive compulsive disorder study. Our models' random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for LCA of multilevel data.
Project description:The Down syndrome cell adhesion molecule (Dscam) gene has essential roles in neural wiring and pathogen recognition in Drosophila melanogaster. Dscam encodes 38,016 distinct isoforms via extensive alternative splicing. The 95 alternative exons in Dscam are organized into clusters that are spliced in a mutually exclusive manner. The exon 6 cluster contains 48 variable exons and uses a complex system of competing RNA structures to ensure that only one variable exon is included. Here we show that the heterogeneous nuclear ribonucleoprotein hrp36 acts specifically within, and throughout, the exon 6 cluster to prevent the inclusion of multiple exons. Moreover, hrp36 prevents serine/arginine-rich proteins from promoting the ectopic inclusion of multiple exon 6 variants. Thus, the fidelity of mutually exclusive splicing in the exon 6 cluster is governed by an intricate combination of alternative RNA structures and a globally acting splicing repressor.
Project description:A central challenge in cancer research is to create models that bridge the gap between the molecular level on which interventions can be designed and the cellular and tissue levels on which the disease phenotypes are manifested. This study was undertaken to construct such a model from functional annotations and explore its use when integrated with large-scale cancer genomics data.We created a map that connects genes to cancer hallmarks via signaling pathways. We projected gene mutation and focal copy number data from various cancer types onto this map. We performed statistical analyses to uncover mutually exclusive and co-occurring oncogenic aberrations within this topology.Our analysis showed that although the genetic fingerprint of tumor types could be very different, there were less variations at the level of hallmarks, consistent with the idea that different genetic alterations have similar functional outcomes. Additionally, we showed how the multilevel map could help to clarify the role of infrequently mutated genes, and we demonstrated that mutually exclusive gene mutations were more prevalent in pathways, whereas many co-occurring gene mutations were associated with hallmark characteristics.Overlaying this map with gene mutation and focal copy number data from various cancer types makes it possible to investigate the similarities and differences between tumor samples systematically at the levels of not only genes but also pathways and hallmarks.
Project description:BACKGROUND:Diffuse large B-cell lymphoma (DLBCL) is a spectrum of disease comprising more than 30% of non-Hodgkin lymphomas. Although studies have identified several molecular subgroups, the heterogeneous genetic background of DLBCL remains ambiguous. In this study we aimed to develop a novel approach and to provide a distinctive classification system to unravel its molecular features. METHOD:A cohort of 342 patient samples diagnosed with DLBCL in our hospital were retrospectively enrolled in this study. A total of 46 genes were included in next-generation sequencing panel. Non-mutually exclusive genetic signatures for the factorization of complex genomic patterns were generated by random forest algorithm. RESULTS:A total of four non-mutually exclusive signatures were generated, including those with MYC-translocation (MYC-trans) (n =?62), with BCL2-translocation (BCL2-trans) (n =?69), with BCL6-translocation (BCL6-trans) (n =?108), and those with MYD88 and/or CD79B mutations (MC) signatures (n =?115). Comparison analysis between our model and traditional mutually exclusive Schmitz's model demonstrated consistent classification pattern. And prognostic heterogeneity existed within EZB subgroup of de novo DLBCL patients. As for prognostic impact, MYC-trans signature was an independent unfavorable prognostic factor. Furthermore, tumors carrying three different signature markers exhibited significantly inferior prognoses compared with their counterparts with no genetic signature. CONCLUSION:Compared with traditional mutually exclusive molecular sub-classification, non-mutually exclusive genetic fingerprint model generated from our study provided novel insight into not only the complex genetic features, but also the prognostic heterogeneity of DLBCL patients.
Project description:Women with gestational diabetes mellitus (GDM) and their infants are at increased risk of developing metabolic disease; however, longer breastfeeding is associated with a reduction in these risks. We tested an intervention to increase breastfeeding duration among women with GDM.We conducted a cluster randomized trial to determine the efficacy of a breastfeeding education and support program for women with GDM. Women were enrolled between 22 and 36 weeks of pregnancy and cluster randomized to an experimental lifestyle intervention or wait-list control group. Breastfeeding duration and intensity were prespecified secondary outcomes of the trial. Duration of exclusive and any breastfeeding was assessed at 6 weeks and at 4, 7, and 10 months postpartum. We quantified differences in breastfeeding rates using Kaplan-Meier estimates, log-rank tests, and Cox regression models.We enrolled 100 women, of whom 52% were African American, 31% non-Hispanic white, 11% Hispanic, 9% American Indian or Alaskan Native, 2% Asian, 2% other, and 4% more than one race. In models accounting for within-cluster correlation and adjusted for study site, breastfeeding intention, and African American race, women allocated to the intervention group were less likely to stop breastfeeding (adjusted hazard ratio [HR] 0.40, 95% confidence interval [CI] 0.21-0.74) or to introduce formula (adjusted HR 0.50, 95% CI 0.34-0.72).Our results suggest that targeted breastfeeding education for women with GDM is feasible and efficacious.http://clinicaltrials.gov/ct2/show/NCT01809431.
Project description:Mutually exclusive splicing of exons is a mechanism of functional gene and protein diversification with pivotal roles in organismal development and diseases such as Timothy syndrome, cardiomyopathy and cancer in humans. In order to obtain a first genomewide estimate of the extent and biological role of mutually exclusive splicing in humans, we predicted and subsequently validated mutually exclusive exons (MXEs) using 515 publically available RNA-Seq datasets. Here, we provide evidence for the expression of over 855 MXEs, 42% of which represent novel exons, increasing the annotated human mutually exclusive exome more than fivefold. The data provide strong evidence for the existence of large and multi-cluster MXEs in higher vertebrates and offer new insights into MXE evolution. More than 82% of the MXE clusters are conserved in mammals, and five clusters have homologous clusters in Drosophila Finally, MXEs are significantly enriched in pathogenic mutations and their spatio-temporal expression might predict human disease pathology.
Project description:OBJECTIVES: The main objective of this study is to assess the Breastfeeding Duration, Exclusive Breastfeeding Duration and other related factors among children aged less than 5 years old in rural areas of Northern Iran. METHODS: This is a descriptive cross-sectional conducted on 2520 children aged 6-60 months (male: 1309, female: 1211) chosen by cluster random sampling from 20 out of 118 villages. Data were collected from mothers using a questionnaire. The duration of breastfeeding was computed only for children aged over 24 months old. Breastfeeding duration and Exclusive Breastfeeding Duration were classified based on WHO definition. SPSS Version 16 was used for data analysis. RESULTS: The mean Exclusive Breastfeeding Duration was 5.59 months, while 66.4% of children had exclusive breastfeeding for at least 6 months. The lowest Exclusive Breastfeeding Duration and the highest Breastfeeding Duration were observed among the Turkman ethnic group. Exclusive Breastfeeding duration of at least 5 months was 14.6%, thus the results were significantly higher than in the Turkman ethnic group (p=0.001). Meanwhile, the results showed that exclusive breastfeeding duration significantly increased with maternal education level (p=0.004). The study found that the mean breastfeeding duration was 20.6 months, and 89.3% and 74.7% of children were breastfed for at least 18 and 24 months, respectively. A positive correlation was reported between breastfeeding duration and family size, birth order, maternal age and children nutritional status, (p<0.05). Additionally, lactation period in underweight children was significantly higher than in obese children, (p=0.023). CONCLUSION: The study found that two-thirds of children exclusively breastfed during the first six months of life and the mean breastfeeding duration was 20.6 months. While both exclusive breastfeeding duration and breastfeeding duration were influenced by socio-demographic factors in the rural areas of Northern Iran.
Project description:The gene Down syndrome cell adhesion molecule (Dscam) potentially encodes 38 016 distinct isoforms in Drosophila melanogaster via mutually exclusive splicing. Here we reveal a combinatorial mechanism of regulation of Dscam exon 17 mutually exclusive splicing through steric hindrance in combination with RNA secondary structure. This mutually exclusive behavior is enforced by steric hindrance, due to the close proximity of the exon 17.2 branch point to exon 17.1 in Diptera, and the interval size constraint in non-Dipteran species. Moreover, intron-exon RNA structures are evolutionarily conserved in 36 non-Drosophila species of six distantly related orders (Diptera, Lepidoptera, Coleoptera, Hymenoptera, Hemiptera, and Phthiraptera), which regulates the selection of exon 17 variants via masking the splice site. By contrast, a previously uncharacterized RNA structure specifically activated exon 17.1 by bringing splice sites closer together in Drosophila, while the other moderately suppressed exon 17.1 selection by hindering the accessibility of polypyrimidine sequences. Taken together, these data suggest a phylogeny of increased complexity in regulating alternative splicing of Dscam exon 17 spanning more than 300 million years of insect evolution. These results also provide models of the regulation of alternative splicing through steric hindrance in combination with dynamic structural codes.
Project description:Tumors acquire somatic DNA copy number aberrations, leading to activation of oncogenes and inactivation of tumor suppressors. Many studies have focused on the analysis of single copy number aberrations and associated driver genes, but few studies have performed combinatorial analyses. We propose a genome-wide scoring framework to find mutually exclusive gains and losses. Mutually exclusive copy number aberrations can identify genes whose oncogenic function is redundant, either by functioning in the same pathway or in a parallel pathway. As one gene is aberrated the selective pressure for its partner is alleviated which leads to a mutually exclusive perturbation pattern. In a dataset of mouse models for invasive lobular carcinoma we found three mutually exclusive DNA amplifications, containing several well-known oncogenes: the Met proto-oncogene on chromosome 6, the cluster of Birc2, Birc3 and Yap1 genes on chromosome 9, and Nras on chromosome 3. Furthermore, gene expression or protein expression of these genes correlates very well with copy number data indicating that they are the target of the amplification. Although homologous amplifications in human tumors are rare, the mutual exclusivity of MET, BIRC/YAP1 and NRAS is maintained in a variety of cancer types. This suggests a novel function for YAP1 in the mitogen-activated signaling pathway by association with MET and NRAS, known players in this pathway. This function is independent to the propensity of YAP1 to cause Epithelial-to-Mesenchymal transition. aCGH data of 67 mouse mammary tumors from K14-Cre and WAP-Cre driven P53-F/F;Cdh1-F/F animals - tumor DNA hybridized against same-animal splenic DNA
Project description:Data used for modelling the household transmission of infectious diseases, such as influenza, have inherent multilevel structures and correlated property, which make the widely used conventional infectious disease transmission models (including the Greenwood model and the Reed-Frost model) not directly applicable within the context of a household (due to the crowded domestic condition or socioeconomic status of the household). Thus, at the household level, the effects resulting from individual-level factors, such as vaccination, may be confounded or modified in some way. We proposed the Bayesian hierarchical random-effects (random intercepts and random slopes) model under the context of generalised linear model to capture heterogeneity and variation on the individual, generation, and household levels. It was applied to empirical surveillance data on the influenza epidemic in Taiwan. The parameters of interest were estimated by using the Markov chain Monte Carlo method in conjunction with the Bayesian directed acyclic graphical models. Comparisons between models were made using the deviance information criterion. Based on the result of the random-slope Bayesian hierarchical method under the context of the Reed-Frost transmission model, the regression coefficient regarding the protective effect of vaccination varied statistically significantly from household to household. The result of such a heterogeneity was robust to the use of different prior distributions (including non-informative, sceptical, and enthusiastic ones). By integrating out the uncertainty of the parameters of the posterior distribution, the predictive distribution was computed to forecast the number of influenza cases allowing for random-household effect.