ABSTRACT: We provide insights into new methodology for the analysis of multilevel binary data observed longitudinally, when the repeated longitudinal measurements are correlated. The proposed model is logistic functional regression conditioned on three latent processes describing the within- and between-variability, and describing the cross-dependence of the repeated longitudinal measurements. We estimate the model components without employing mixed-effects modeling but assuming an approximation to the logistic link function. The primary objectives of this article are to highlight the challenges in the estimation of the model components, to compare two approximations to the logistic regression function, linear and exponential, and to discuss their advantages and limitations. The linear approximation is computationally efficient whereas the exponential approximation applies for rare events functional data. Our methods are inspired by and applied to a scientific experiment on spectral backscatter from long range infrared light detection and ranging (LIDAR) data. The models are general and relevant to many new binary functional data sets, with or without dependence between repeated functional measurements.
Project description:In the context of observational longitudinal studies, we explored the values of the number of participants and the number of repeated measurements that maximize the power to detect the hypothesized effect, given the total cost of the study. We considered two different models, one that assumes a transient effect of exposure and one that assumes a cumulative effect. Results were derived for a continuous response variable, whose covariance structure was assumed to be damped exponential, and a binary time-varying exposure. Under certain assumptions, we derived simple formulas for the approximate solution to the problem in the particular case in which the response covariance structure is assumed to be compound symmetry. Results showed the importance of the exposure intraclass correlation in determining the optimal combination of the number of participants and the number of repeated measurements, and therefore the optimized power. Thus, incorrectly assuming a time-invariant exposure leads to inefficient designs. We also analyzed the sensitivity of results to dropout, mis-specification of the response correlation structure, allowing a time-varying exposure prevalence and potential confounding impact. We illustrated some of these results in a real study. In addition, we provide software to perform all the calculations required to explore the combination of the number of participants and the number of repeated measurements.
Project description:For the analysis of the longitudinal hypertension family data, we focused on modeling binary traits of hypertension measured repeatedly over time. Our primary objective is to examine predictive abilities of longitudinal models for genetic associations. We first identified single-nucleotide polymorphisms (SNPs) associated with any occurrence of hypertension over the study period to set up covariates for the longitudinal analysis. Then, we proceeded to the longitudinal analysis of the repeated measures of binary hypertension with covariates including SNPs by accounting for correlations arising from repeated outcomes and among family members. We examined two popular models for longitudinal binary outcomes: (a) a marginal model based on the generalized estimating equations, and (b) a conditional model based on the logistic random effect model. The effects of risk factors associated with repeated hypertensions were compared for these two models and their prediction abilities were assessed with and without genetic information. Based on both approaches, we found a significant interaction effect between age and gender where males were at higher risk of hypertension before age 35 years, but after age 35 years, women were at higher risk. Moreover, the SNPs were significantly associated with hypertension after adjusting for age, gender, and smoking status. The SNPs contributed more to predict hypertension in the marginal model than in the conditional model. There was substantial correlation among repeated measures of hypertension, implying that hypertension was considerably correlated with previous experience of hypertension. The conditional model performed better for predicting the future hypertension status of individuals.
Project description:PURPOSE:To propose and validate an efficient method, based on a biophysically motivated signal model, for removing the orientation-dependent part of R2* using a single gradient-recalled echo (GRE) measurement. METHODS:The proposed method utilized a temporal second-order approximation of the hollow-cylinder-fiber model, in which the parameter describing the linear signal decay corresponded to the orientation-independent part of R2* . The estimated parameters were compared to the classical, mono-exponential decay model for R2* in a sample of an ex vivo human optic chiasm (OC). The OC was measured at 16 distinct orientations relative to the external magnetic field using GRE at 7T. To show that the proposed signal model can remove the orientation dependence of R2* , it was compared to the established phenomenological method for separating R2* into orientation-dependent and -independent parts. RESULTS:Using the phenomenological method on the classical signal model, the well-known separation of R2* into orientation-dependent and -independent parts was verified. For the proposed model, no significant orientation dependence in the linear signal decay parameter was observed. CONCLUSIONS:Since the proposed second-order model features orientation-dependent and -independent components at distinct temporal orders, it can be used to remove the orientation dependence of R2* using only a single GRE measurement.
Project description:<h4>Background</h4>It is critically important to assess the prognostic value of NT-proBNP in the form of repeated measures among children undergoing surgery for congenital heart defects (CHD). The aim of the present study is to assess the value of repeated perioperative NT-proBNP in evaluating the time dependent and temporal trajectory in prognostics diagnosis during the perioperative period in a large series of children with CHD.<h4>Methods</h4>Repeated measures of NT-proBNP from 329 consecutive children with CHD were obtained before and 1, 12, and 36?h after surgery, respectively. For fully utilizing longitudinal characteristics, we employed parallel cross-sectional logistic regression, a two stage mixed effect model and trajectories over time analysis to mine the predictive value of perioperative NT-proBNP on the binary outcome of prolonged intensive care unit (ICU) stay.<h4>Results</h4>The two stage mixed effects model confirmed that both the mean NT-proBNP level (aOR?=?1.46, P?=?0.001) and the time trends had prognostic value on the prediction of prolonged ICU stay. In the fully adjusted logistic regression analyses based on gaussian distributions, "rapidly rising NT-proBNP" put the subjects at 5.4-times higher risk of prolonged ICU stay compared with "slowly rising" group (aOR?=?5.40, P?=?0.003).<h4>Conclusions</h4>Comprehensive assessment of the time dependent and temporal trajectory in perioperative NT-proBNP, indicated by repeated measurements, can provide more accurate identification of children with higher risk of prolonged ICU stay after CHD surgery.
Project description:This paper deals with the detection and identification of changepoints among covariances of high-dimensional longitudinal data, where the number of features is greater than both the sample size and the number of repeated measurements. The proposed methods are applicable under general temporal-spatial dependence. A new test statistic is introduced for changepoint detection, and its asymptotic distribution is established. If a changepoint is detected, an estimate of the location is provided. The rate of convergence of the estimator is shown to depend on the data dimension, sample size, and signal-to-noise ratio. Binary segmentation is used to estimate the locations of possibly multiple changepoints, and the corresponding estimator is shown to be consistent under mild conditions. Simulation studies provide the empirical size and power of the proposed test and the accuracy of the changepoint estimator. An application to a time-course microarray dataset identifies gene sets with significant gene interaction changes over time.
Project description:Most cases of deaths from colorectal cancer (CRC) result from metastases, which are often still undetectable at disease detection time. Even so, in many cases, shedding is assumed to have taken place before that time. The dynamics of metastasis formation and growth are not well-established. This work aims to explore CRC lung metastasis growth rate and dynamics. We analyzed a test case of a metastatic CRC patient with four lung metastases, with data of four serial computed tomography (CT) scans measuring metastasis sizes while untreated. We fitted three mathematical growth models—exponential, logistic, and Gompertzian—to the CT measurements. For each metastasis, a best-fitted model was determined, tumor doubling time (TDT) was assessed, and metastasis inception time was extrapolated. Three of the metastases showed exponential growth, while the fourth showed logistic restraint of the growth. TDT was around 93 days. Predicted metastasis inception time was at least 4–5 years before the primary tumor diagnosis date, though they did not reach detectable sizes until at least 1 year after primary tumor resection. Our results support the exponential growth approximation for most of the metastases, at least for the clinically observed time period. Our analysis shows that metastases can be initiated before the primary tumor is detectable and implies that surgeries accelerate metastasis growth.
Project description:In disease screening, the combination of multiple biomarkers often substantially improves the diagnostic accuracy over a single marker. This is particularly true for longitudinal biomarkers where individual trajectory may improve the diagnosis. We propose a pattern mixture model (PMM) framework to predict a binary disease status from a longitudinal sequence of biomarkers. The marker distribution given the disease status is estimated from a linear mixed effects model. A likelihood ratio statistic is computed as the combination rule, which is optimal in the sense of the maximum receiver operating characteristic (ROC) curve under the correctly specified mixed effects model. The individual disease risk score is then estimated by Bayes' theorem, and we derive the analytical form of the 95% confidence interval. We show that this PMM is an approximation to the shared random effects (SRE) model proposed by Albert (2012. A linear mixed model for predicting a binary event from longitudinal data under random effects mis-specification. Statistics in Medicine 31: (2), 143-154). Further, with extensive simulation studies, we found that the PMM is more robust than the SRE model under wide classes of models. This new PPM approach for combining biomarkers is motivated by and applied to a fetal growth study, where the interest is in predicting macrosomia using longitudinal ultrasound measurements.
Project description:Longitudinal phenotypes have been increasingly available in genome-wide association studies (GWAS) and electronic health record-based studies for identification of genetic variants that influence complex traits over time. For longitudinal binary data, there remain significant challenges in gene mapping, including misspecification of the model for phenotype distribution due to ascertainment. Here, we propose L-BRAT (Longitudinal Binary-trait Retrospective Association Test), a retrospective, generalized estimating equation-based method for genetic association analysis of longitudinal binary outcomes. We also develop RGMMAT, a retrospective, generalized linear mixed model-based association test. Both tests are retrospective score approaches in which genotypes are treated as random conditional on phenotype and covariates. They allow both static and time-varying covariates to be included in the analysis. Through simulations, we illustrated that retrospective association tests are robust to ascertainment and other types of phenotype model misspecification, and gain power over previous association methods. We applied L-BRAT and RGMMAT to a genome-wide association analysis of repeated measures of cocaine use in a longitudinal cohort. Pathway analysis implicated association with opioid signaling and axonal guidance signaling pathways. Lastly, we replicated important pathways in an independent cocaine dependence case-control GWAS. Our results illustrate that L-BRAT is able to detect important loci and pathways in a genome scan and to provide insights into genetic architecture of cocaine use.
Project description:AIMS:To examine whether DSM-IV symptoms of substance dependence are psychometrically equivalent between existing community-sampled and clinically overselected studies. PARTICIPANTS:A total of 2476 adult twins born in Minnesota and 4121 unrelated adult participants from a case-control study of alcohol dependence. MEASUREMENTS:Life-time DSM-IV alcohol, marijuana and cocaine dependence symptoms and ever use of each substance. DESIGN:We fitted a hierarchical model to the data, in which ever use and dependence symptoms for each substance were indicators of alcohol, marijuana or cocaine dependence which were, in turn, indicators of a multi-substance dependence factor. We then tested the model for measurement invariance across participant groups, defined by study source and participant sex. FINDINGS:The hierarchical model fitted well among males and females within each sample [comparative fit index (CFI) > 0.96, Tucker-Lewis index (TLI) > 0.95 and root mean square error of approximation (RMSEA) < 0.04 for all], and a multi-group model demonstrated that model parameters were equivalent across sample- and sex-defined groups (?CFI = 0.002 between constrained and unconstrained models). Differences between groups in symptom endorsement rates could be expressed solely as mean differences in the multi-substance dependence factor. CONCLUSIONS:Life-time substance dependence symptoms fitted a dimensional model well. Although clinically overselected participants endorsed more dependence symptoms, on average, than community-sampled participants, the pattern of symptom endorsement was similar across groups. From a measurement perspective, DSM-IV criteria are equally appropriate for describing substance dependence across different sampling methods.
Project description:Often a binary variable is generated by dichotomizing an underlying continuous variable measured at a specific time point according to a prespecified threshold value. In the event that the underlying continuous measurements are from a longitudinal study, one can use the repeated-measures model to impute missing data on responder status as a result of subject dropout and apply the logistic regression model on the observed or otherwise imputed responder status. Standard Bayesian multiple imputation techniques (Rubin, 1987, in Multiple Imputation for Nonresponse in Surveys) that draw the parameters for the imputation model from the posterior distribution and construct the variance of parameter estimates for the analysis model as a combination of within- and between-imputation variances are found to be conservative. The frequentist multiple imputation approach that fixes the parameters for the imputation model at the maximum likelihood estimates and construct the variance of parameter estimates for the analysis model using the results of Robins and Wang (2000, Biometrika 87, 113-124) is shown to be more efficient. We propose to apply (Kenward and Roger, 1997, Biometrics 53, 983-997) degrees of freedom to account for the uncertainty associated with variance-covariance parameter estimates for the repeated measures model.