Determining directional dependency in causal associations.
ABSTRACT: Directional dependency is a method to determine the likely causal direction of effect between two variables. This article aims to critique and improve upon the use of directional dependency as a technique to infer causal associations. We comment on several issues raised by von Eye and DeShon (2012), including: encouraging the use of the signs of skewness and excessive kurtosis of both variables, discouraging the use of D'Agostino's K(2), and encouraging the use of directional dependency to compare variables only within time points. We offer improved steps for determining directional dependency that fix the problems we note. Next, we discuss how to integrate directional dependency into longitudinal data analysis with two variables. We also examine the accuracy of directional dependency evaluations when several regression assumptions are violated. Directional dependency can suggest the direction of a relation if (a) the regression error in population is normal, (b) an unobserved explanatory variable correlates with any variables equal to or less than .2, (c) a curvilinear relation between both variables is not strong (standardized regression coefficient ? .2), (d) there are no bivariate outliers, and (e) both variables are continuous.
Project description:Directional association measured by functional dependency can answer important questions on relationships between variables, for example, in discovery of molecular interactions in biological systems. However, when one has no prior information about the functional form of a directional association, there is not a widely established statistical procedure to detect such an association. To address this issue, here we introduce an exact functional test for directional association by examining the strength of functional dependency. It is effective in promoting functional patterns by reducing statistical power on non-functional patterns. We designed an algorithm to carry out the test using a fast branch-and-bound strategy, which achieved a substantial speedup over brute-force enumeration. On data from an epidemiological study of liver cancer, the test identified the hepatitis status of a subject as the most influential risk factor among others for the cancer phenotype. On human lung cancer transcriptome data, the test selected 1049 transcription start sites of putative noncoding RNAs directionally associated with lung cancers, stronger than 95% of 589 curated cancer genes. These predictions include non-monotonic interaction patterns, to which other routine tests were insensitive. Complementing symmetric (non-directional) association methods such as Fisher's exact test, the exact functional test is a unique exact statistical test for evaluating evidence for causal relationships.
Project description:Mendelian randomization (MR) implemented through instrumental variables analysis is an increasingly popular causal inference tool used in genetic epidemiology. But it can have limitations for evaluating simultaneous causal relationships in complex data sets that include, for example, multiple genetic predictors and multiple potential risk factors associated with the same genetic variant. Here we use real and simulated data to investigate Bayesian network analysis (BN) with the incorporation of directed arcs, representing genetic anchors, as an alternative approach. A Bayesian network describes the conditional dependencies/independencies of variables using a graphical model (a directed acyclic graph) with an accompanying joint probability. In real data, we found BN could be used to infer simultaneous causal relationships that confirmed the individual causal relationships suggested by bi-directional MR, while allowing for the existence of potential horizontal pleiotropy (that would violate MR assumptions). In simulated data, BN with two directional anchors (mimicking genetic instruments) had greater power for a fixed type 1 error than bi-directional MR, while BN with a single directional anchor performed better than or as well as bi-directional MR. Both BN and MR could be adversely affected by violations of their underlying assumptions (such as genetic confounding due to unmeasured horizontal pleiotropy). BN with no directional anchor generated inference that was no better than by chance, emphasizing the importance of directional anchors in BN (as in MR). Under highly pleiotropic simulated scenarios, BN outperformed both MR (and its recent extensions) and two recently-proposed alternative approaches: a multi-SNP mediation intersection-union test (SMUT) and a latent causal variable (LCV) test. We conclude that BN incorporating genetic anchors is a useful complementary method to conventional MR for exploring causal relationships in complex data sets such as those generated from modern "omics" technologies.
Project description:Orienting the causal relationship between pairs of traits is a fundamental task in scientific research with significant implications in practice, such as in prioritizing molecular targets and modifiable risk factors for developing therapeutic and interventional strategies for complex diseases. A recent method, called Steiger's method, using a single SNP as an instrument variable (IV) in the framework of Mendelian randomization (MR), has since been widely applied. We report the following new contributions. First, we propose a single SNP-based alternative, overcoming a severe limitation of Steiger's method in simply assuming, instead of inferring, the existence of a causal relationship. We also clarify a condition necessary for the validity of the methods in the presence of hidden confounding. Second, to improve statistical power, we propose combining the results from multiple, and possibly correlated, SNPs as multiple instruments. Third, we develop three goodness-of-fit tests to check modeling assumptions, including those required for valid IVs. Fourth, by relaxing one of the three IV assumptions in MR, we propose several methods, including an Egger regression-like approach and its multivariable version (analogous to multivariable MR), to account for horizontal pleiotropy of the SNPs/IVs, which is often unavoidable in practice. All our methods can simultaneously infer both the existence and (if so) the direction of a causal relationship, largely expanding their applicability over that of Steiger's method. Although we focus on uni-directional causal relationships, we also briefly discuss an extension to bi-directional relationships. Through extensive simulations and an application to infer the causal directions between low density lipoprotein (LDL) cholesterol, or high density lipoprotein (HDL) cholesterol, and coronary artery disease (CAD), we demonstrate the superior performance and advantage of our proposed methods over Steiger's method and bi-directional MR. In particular, after accounting for horizontal pleiotropy, our method confirmed the well known causal direction from LDL to CAD, while other methods, including bi-directional MR, might fail.
Project description:Hip fractures commonly result in permanent disability, institutionalization or death in elderly. Existing hip-fracture predicting tools are underused in clinical practice, partly due to their lack of intuitive interpretation. By use of a graphical layer, Bayesian network models could increase the attractiveness of fracture prediction tools. Our aim was to study the potential contribution of a causal Bayesian network in this clinical setting. A logistic regression was performed as a standard control approach to check the robustness of the causal Bayesian network approach.EPIDOS is a multicenter study, conducted in an ambulatory care setting in five French cities between 1992 and 1996 and updated in 2010. The study included 7598 women aged 75 years or older, in which fractures were assessed quarterly during 4 years. A causal Bayesian network and a logistic regression were performed on EPIDOS data to describe major variables involved in hip fractures occurrences.Both models had similar association estimations and predictive performances. They detected gait speed and mineral bone density as variables the most involved in the fracture process. The causal Bayesian network showed that gait speed and bone mineral density were directly connected to fracture and seem to mediate the influence of all the other variables included in our model. The logistic regression approach detected multiple interactions involving psychotropic drug use, age and bone mineral density.Both approaches retrieved similar variables as predictors of hip fractures. However, Bayesian network highlighted the whole web of relation between the variables involved in the analysis, suggesting a possible mechanism leading to hip fracture. According to the latter results, intervention focusing concomitantly on gait speed and bone mineral density may be necessary for an optimal prevention of hip fracture occurrence in elderly people.
Project description:The use of genetic variants as instrumental variables - an approach known as Mendelian randomization - is a popular epidemiological method for estimating the causal effect of an exposure (phenotype, biomarker, risk factor) on a disease or health-related outcome from observational data. Instrumental variables must satisfy strong, often untestable assumptions, which means that finding good genetic instruments among a large list of potential candidates is challenging. This difficulty is compounded by the fact that many genetic variants influence more than one phenotype through different causal pathways, a phenomenon called horizontal pleiotropy. This leads to errors not only in estimating the magnitude of the causal effect but also in inferring the direction of the putative causal link. In this paper, we propose a Bayesian approach called BayesMR that is a generalization of the Mendelian randomization technique in which we allow for pleiotropic effects and, crucially, for the possibility of reverse causation. The output of the method is a posterior distribution over the target causal effect, which provides an immediate and easily interpretable measure of the uncertainty in the estimation. More importantly, we use Bayesian model averaging to determine how much more likely the inferred direction is relative to the reverse direction.
Project description:Sufficient cause interactions concern cases in which there is a particular causal mechanism for some outcome that requires the presence of 2 or more specific causes to operate. Empirical conditions have been derived to test for sufficient cause interactions. However, when regression outcome models are used to control for confounding variables in tests for sufficient cause interactions, the outcome models impose restrictions on the relation between the confounding variables and certain unidentified background causes within the sufficient cause framework; often, these assumptions are implausible. By using marginal structural models, rather than outcome regression models, to test for sufficient cause interactions, modeling assumptions are instead made on the relation between the causes of interest and the confounding variables; these assumptions will often be more plausible. The use of marginal structural models also allows for testing for sufficient cause interactions in the presence of time-dependent confounding. Such time-dependent confounding may arise in cases in which one factor of interest affects both the second factor of interest and the outcome. It is furthermore shown that marginal structural models can be used not only to test for sufficient cause interactions but also to give lower bounds on the prevalence of such sufficient cause interactions.
Project description:We study the problem of causal structure learning in linear systems from observational data given in multiple domains, across which the causal coefficients and/or the distribution of the exogenous noises may vary. The main tool used in our approach is the principle that in a causally sufficient system, the causal modules, as well as their included parameters, change independently across domains. We first introduce our approach for finding causal direction in a system comprising two variables and propose efficient methods for identifying causal direction. Then we generalize our methods to causal structure learning in networks of variables. Most of previous work in structure learning from multi-domain data assume that certain types of invariance are held in causal modules across domains. Our approach unifies the idea in those works and generalizes to the case that there is no such invariance across the domains. Our proposed methods are generally capable of identifying causal direction from fewer than ten domains. When the invariance property holds, two domains are generally sufficient.
Project description:Information on motivations for research participation, may enable professionals to better tailor the process of recruitment and informed consent to the perspective of parents and children. Therefore, this systematic review assesses motivating and discouraging factors for children and their parents to decide to participate in clinical drug research. Studies were identified from searches in 6 databases. Two independent reviewers screened and selected relevant articles. Results were aggregated and presented by use of qualitative metasummary. 38 studies fulfilled the selection criteria and were of sufficient quality for inclusion in the qualitative metasummary. Most mentioned motivating factors for parents were: health benefit for child, altruism, trust in research, and relation to researcher. Most mentioned motivating factors for children were: personal health benefit, altruism and increasing comfort. Fear of risks, distrust in research, logistical aspects and disruption of daily life were mentioned most by parents as discouraging factors. Burden and disruption of daily life, feeling like a "guinea pig" and fear of risks were most mentioned as discouraging factors by children.Paying attention to these motivating and discouraging factors of children and their parents during the recruitment and informed consent process in drug research increases the moral and instrumental value of informed consent.• This systematic review pools the existing empirical literature on motivations of minors and their parents to consent or dissent to participation in clinical drug research. • The most mentioned motivating and discouraging factors for children and their parents to consent to participation in clinical drug research are identified aggregated and presented by use of qualitative metasummary. What is new: • This information can be used to adapt the research protocol, recruitment, and informed consent/assent process to the needs of children and their parents.
Project description:BACKGROUND:Most statistical methods used to identify cancer driver genes are either biased due to choice of assumed parametric models or insensitive to directional relationships important for causal inference. To overcome modeling biases and directional insensitivity, a recent statistical functional chi-squared test (FunChisq) detects directional association via model-free functional dependency. FunChisq examines patterns pointing from independent to dependent variables arising from linear, non-linear, or many-to-one functional relationships. Meanwhile, the Functional Annotation of Mammalian Genome 5 (FANTOM5) project surveyed gene expression at over 200,000 transcription start sites (TSSs) in nearly all human tissue types, primary cell types, and cancer cell lines. The data cover TSSs originated from both coding and noncoding genes. For the vast uncharacterized human TSSs that may exhibit complex patterns in cancer versus normal tissues, the model-free property of FunChisq provides us an unprecedented opportunity to assess the evidence for a gene's directional effect on human cancer. RESULTS:We first evaluated FunChisq and six other methods using 719 curated cancer genes on the FANTOM5 data. FunChisq performed best in detecting known cancer driver genes from non-cancer genes. We also show the capacity of FunChisq to reveal non-monotonic patterns of functional association, to which typical differential analysis methods such as t-test are insensitive. Further applying FunChisq to screen unannotated TSSs in FANTOM5, we predicted 1108 putative cancer driver noncoding RNAs, stronger than 90% of curated cancer driver genes. Next, we compared leukemia samples against other samples in FANTOM5 and FunChisq predicted 332/79 potential biomarkers for lymphoid/myeloid leukemia, stronger than the TSSs of all 87/100 known driver genes in lymphoid/myeloid leukemia. CONCLUSIONS:This study demonstrated the advantage of FunChisq in revealing directional association, especially in detecting non-monotonic patterns. Here, we also provide the most comprehensive catalog of high-quality biomarkers that may play a causative role in human cancers, including putative cancer driver noncoding RNAs and lymphoid/myeloid leukemia specific biomarkers.
Project description:Directed acyclic graphs (DAGs) are nonparametric graphical tools used to depict causal relations in the epidemiologic assessment of exposure-outcome associations. Although their use in dental research was first advocated in 2002, DAGs have yet to be widely adopted in this field. DAGs help identify threats to causal inference such as confounders, bias due to subject selection, and inappropriate handling of missing data. DAGs can also inform the data analysis strategy based on relations among variables depicted on it. This article uses the example of a study of temporomandibular disorders (TMDs), investigating causal effects of facial injury on subsequent risk of TMD. We illustrate how DAGs can be used to identify 1) potential confounders, 2) mediators and the consequences of attempt to estimate direct causal effects, 3) colliders and the consequences of conditioning on colliders, and 4) variables that are simultaneously mediators and confounders and the consequences of adjustment for such variables. For example, one DAG shows that statistical adjustment for the pressure pain threshold would necessarily bias the causal relation between facial injury and TMD. Finally, we discuss the usefulness of DAGs during study design, subject selection, and choosing variables to be measured in a study.