Project description:BackgroundIn observational studies, how the magnitude of potential selection bias in a sensitivity analysis can be quantified is rarely discussed. The purpose of this study was to develop a sensitivity analysis strategy by using the bias-correction index (BCI) approach for quantifying the influence and direction of selection bias.MethodsWe used a BCI, a function of selection probabilities conditional on outcome and covariates, with different selection bias scenarios in a logistic regression setting. A bias-correction sensitivity plot was illustrated to analyze the associations between proctoscopy examination and sociodemographic variables obtained using the data from the Taiwan National Health Interview Survey (NHIS) and of a subset of individuals who consented to having their health insurance data further linked.ResultsWe included 15,247 people aged ≥20 years, and 87.74% of whom signed the informed consent. When the entire sample was considered, smokers were less likely to undergo proctoscopic examination (odds ratio (OR): 0.69, 95% CI [0.57-0.84]), than nonsmokers were. When the data of only the people who provided consent were considered, the OR was 0.76 (95% CI [0.62-0.94]). The bias-correction sensitivity plot indicated varying ORs under different degrees of selection bias.ConclusionsWhen data are only available in a subsample of a population, a bias-correction sensitivity plot can be used to easily visualize varying ORs under different selection bias scenarios. The similar strategy can be applied to models other than logistic regression if an appropriate BCI is derived.
Project description:Complete-case analyses can be biased if missing data are not missing completely at random. We propose simple sensitivity analyses that apply to complete-case estimates of treatment effects; these analyses use only simple summary data and obviate specifying the precise mechanism of missingness and making distributional assumptions. Bias arises when treatment effects differ between retained and nonretained participants or, among retained participants, the estimate is biased because conditioning on retention has induced a noncausal path between the treatment and outcome. We thus bound the overall treatment effect on the difference scale by specifying: 1) the unobserved treatment effect among nonretained participants; and 2) the strengths of association that unobserved variables have with the exposure and with the outcome among retained participants ("induced confounding associations"). Working with the former sensitivity parameter subsumes certain existing methods of worst-case imputation while also accommodating less-conservative assumptions (e.g., that the treatment is not detrimental on average even among nonretained participants). As an analog to the E-value for confounding, we propose the M-value, which represents, for a specified treatment effect among nonretained participants, the strength of induced confounding associations required to reduce the treatment effect to the null or to any other value. These methods could help characterize the robustness of complete-case analyses to potential bias due to missing data.
Project description:Confounding, selection bias, and measurement error are well-known sources of bias in epidemiologic research. Methods for assessing these biases have their own limitations. Many quantitative sensitivity analysis approaches consider each type of bias individually, although more complex approaches are harder to implement or require numerous assumptions. By failing to consider multiple biases at once, researchers can underestimate-or overestimate-their joint impact. We show that it is possible to bound the total composite bias owing to these three sources and to use that bound to assess the sensitivity of a risk ratio to any combination of these biases. We derive bounds for the total composite bias under a variety of scenarios, providing researchers with tools to assess their total potential impact. We apply this technique to a study where unmeasured confounding and selection bias are both concerns and to another study in which possible differential exposure misclassification and confounding are concerns. The approach we describe, though conservative, is easier to implement and makes simpler assumptions than quantitative bias analysis. We provide R functions to aid implementation.
Project description:To make informed policy recommendations from observational panel data, researchers must consider the effects of confounding and temporal variability in outcome variables. Difference-in-difference methods allow for estimation of treatment effects under the parallel trends assumption. To justify this assumption, methods for matching based on covariates, outcome levels, and outcome trends-such as the synthetic control approach-have been proposed. While these tools can reduce bias and variability in some settings, we show that certain applications can introduce regression to the mean (RTM) bias into estimates of the treatment effect. Through simulations, we show RTM bias can lead to inflated type I error rates and bias toward the null in typical policy evaluation settings. We develop a novel correction for RTM bias that allows for valid inference and show how this correction can be used in a sensitivity analysis. We apply our proposed sensitivity analysis to reanalyze data concerning the effects of California's Proposition 99, a large-scale tobacco control program, on statewide smoking rates.
Project description:The early evaluation of prognostic tumour markers is commonly performed by comparing the survival of two groups of patients identified on the basis of a cut-off value. The corresponding hazard ratio (HR) is usually estimated, representing a measure of the relative risk between patients with marker values above and below the cut-off. A posteriori methods identifying an optimal cut-off are appropriate when the functional form of the relation between the marker distribution and patient survival is unknown, but they are prone to an overestimation bias. In the presence of a small sample size, which is typical of rare diseases, the external validation sets are hardly available and internal cross-validation could be unfeasible. We describe a new method to obtain an unbiased estimate of the HR at an optimal cut-off, exploiting the simple relation between the HR and the associated p-value estimated by a random permutation analysis. We validate the method on both simulated data and set of gene expression profiles from two large, publicly available data sets. Furthermore, a reanalysis of a previously published study, which included 134 Stage 4S neuroblastoma patients, allowed for the identification of E2F1 as a new gene with potential oncogenic activity. This finding was confirmed by an immunofluorescence analysis on an independent cohort.
Project description:Sensitivity analysis results are given for differential measurement error of either the exposure or outcome. In the case of differential measurement error of the outcome, it is shown that the true effect of the exposure on the outcome on the risk ratio scale must be at least as large as the observed association between the exposure and the mismeasured outcome divided by the maximum strength of differential measurement error. This maximum strength of differential measurement error is itself assessed as the risk ratio of the controlled direct effect of the exposure on the mismeasured outcome not through the true outcome. In the case of differential measurement error of the exposure, under certain assumptions concerning classification probabilities, the true effect on the odds ratio scale of the exposure on the outcome must be at least as large as the observed odds ratio between the mismeasured exposure and the outcome divided by the maximum odds ratio of the effect of the outcome on mismeasured exposure conditional on the true exposure. The results can be immediately used to indicate the minimum strength of differential measurement error that would be needed to explain away an observed association between an exposure measurement and an outcome measurement for this to be solely due to measurement error.
Project description:We propose sensitivity analyses for publication bias in meta-analyses. We consider a publication process such that 'statistically significant' results are more likely to be published than negative or "non-significant" results by an unknown ratio, η. Our proposed methods also accommodate some plausible forms of selection based on a study's standard error. Using inverse probability weighting and robust estimation that accommodates non-normal population effects, small meta-analyses, and clustering, we develop sensitivity analyses that enable statements such as 'For publication bias to shift the observed point estimate to the null, "significant" results would need to be at least 30 fold more likely to be published than negative or "non-significant" results'. Comparable statements can be made regarding shifting to a chosen non-null value or shifting the confidence interval. To aid interpretation, we describe empirical benchmarks for plausible values of η across disciplines. We show that a worst-case meta-analytic point estimate for maximal publication bias under the selection model can be obtained simply by conducting a standard meta-analysis of only the negative and 'non-significant' studies; this method sometimes indicates that no amount of such publication bias could 'explain away' the results. We illustrate the proposed methods by using real meta-analyses and provide an R package: PublicationBias.
Project description:Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia.
Project description:ObjectivesSelection bias is a well-known concern in research on older adults. We discuss two common forms of selection bias in aging research: (1) survivor bias and (2) bias due to loss to follow-up. Our objective was to review these two forms of selection bias in geriatrics research. In clinical aging research, selection bias is a particular concern because all participants must have survived to old age, and be healthy enough, to take part in a research study in geriatrics.DesignWe demonstrate the key issues related to selection bias using three case studies focused on obesity, a common clinical risk factor in older adults. We also created a Selection Bias Toolkit that includes strategies to prevent selection bias when designing a research study in older adults and analytic techniques that can be used to examine, and correct for, the influence of selection bias in geriatrics research.ResultsSurvivor bias and bias due to loss to follow-up can distort study results in geriatric populations. Key steps to avoid selection bias at the study design stage include creating causal diagrams, minimizing barriers to participation, and measuring variables that predict loss to follow-up. The Selection Bias Toolkit details several analytic strategies available to geriatrics researchers to examine and correct for selection bias (eg, regression modeling and sensitivity analysis).ConclusionThe toolkit is designed to provide a broad overview of methods available to examine and correct for selection bias. It is specifically intended for use in the context of aging research. J Am Geriatr Soc 67:1970-1976, 2019.