Combining information from two data sources with misreporting and incompleteness to assess hospice-use among cancer patients: a multiple imputation approach.
ABSTRACT: Combining information from multiple data sources can enhance estimates of health-related measures by using one source to supply information that is lacking in another, assuming the former has accurate and complete data. However, there is little research conducted on combining methods when each source might be imperfect, for example, subject to measurement errors and/or missing data. In a multisite study of hospice-use by late-stage cancer patients, this variable was available from patients' abstracted medical records, which may be considerably underreported because of incomplete acquisition of these records. Therefore, data for Medicare-eligible patients were supplemented with their Medicare claims that contained information on hospice-use, which may also be subject to underreporting yet to a lesser degree. In addition, both sources suffered from missing data because of unit nonresponse from medical record abstraction and sample undercoverage for Medicare claims. We treat the true hospice-use status from these patients as a latent variable and propose to multiply impute it using information from both data sources, borrowing the strength from each. We characterize the complete-data model as a product of an 'outcome' model for the probability of hospice-use and a 'reporting' model for the probability of underreporting from both sources, adjusting for other covariates. Assuming the reports of hospice-use from both sources are missing at random and the underreporting are conditionally independent, we develop a Bayesian multiple imputation algorithm and conduct multiple imputation analyses of patient hospice-use in demographic and clinical subgroups. The proposed approach yields more sensible results than alternative methods in our example. Our model is also related to dual system estimation in population censuses and dual exposure assessment in epidemiology.
Project description:Record linkage is a valuable and efficient tool for connecting information from different data sources. The National Center for Health Statistics (NCHS) has linked its population-based health surveys with administrative data, including Medicare enrollment and claims records. However, the linked NCHS-Medicare files are subject to missing data; first, not all survey participants agree to record linkage, and second, Medicare claims data are only consistently available for beneficiaries enrolled in the Fee-for-Service (FFS) program, not in Medicare Advantage (MA) plans. In this research, we examine the usefulness of multiple imputation for handling missing data in linked National Health Interview Survey (NHIS)-Medicare files. The motivating example is a study of mammography status from 1999 to 2004 among women aged 65 years and older enrolled in the FFS program. In our example, mammography screening status and FFS/MA plan type are missing for NHIS survey participants who were not linkage eligible. Mammography status is also missing for linked participants in an MA plan. We explore three imputation approaches: (i) imputing screening status first, (ii) imputing FFS/MA plan type first, (iii) and imputing the two longitudinal processes simultaneously. We conduct simulation studies to evaluate these methods and compare them using the linked NHIS-Medicare files. The imputation procedures described in our paper would also be applicable to other public health-related research using linked data files with missing data issues arising from program characteristics (e.g., intermittent enrollment or data collection) reflected in administrative data and linkage eligibility by survey participants.
Project description:In comparative effectiveness research, we are often interested in the estimation of an average causal effect from large observational data (the main study). Often this data does not measure all the necessary confounders. In many occasions, an extensive set of additional covariates is measured for a smaller and non-representative population (the validation study). In this setting, standard approaches for missing data imputation might not be adequate due to the large number of missing covariates in the main data relative to the smaller sample size of the validation data. We propose a Bayesian approach to estimate the average causal effect in the main study that borrows information from the validation study to improve confounding adjustment. Our approach combines ideas of Bayesian model averaging, confounder selection, and missing data imputation into a single framework. It allows for different treatment effects in the main study and in the validation study, and propagates the uncertainty due to the missing data imputation and confounder selection when estimating the average causal effect (ACE) in the main study. We compare our method to several existing approaches via simulation. We apply our method to a study examining the effect of surgical resection on survival among 10 396 Medicare beneficiaries with a brain tumor when additional covariate information is available on 2220 patients in SEER-Medicare. We find that the estimated ACE decreases by 30% when incorporating additional information from SEER-Medicare.
Project description:In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation- Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case-control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered.
Project description:<h4>Objective</h4>To examine the effect of the Medicare hospice benefit on Medicare and Medicaid expenditures by dual-eligible Medicare-Medicaid nursing home (NH) residents.<h4>Data sources/study setting</h4>Secondary data for NH residents for 1998-1999.<h4>Study design</h4>Retrospective cohort study of NH residents in the state of Florida who died between July and December 1999 (N=5,774). Medicare claims identified hospice enrollment, and Medicare and Medicaid claims identified expenditures by categories of care. Nursing home resident assessments were used to control for case-mix differences. Geocoding of nursing homes, hospice providers and hospitals was used to identify and characterize local health care markets.<h4>Data collection/extraction methods</h4>A file was constructed linking Medicare and Medicaid claims to Minimum Data Set assessments of NH residents, and NH provider (Online Survey and Certification Automated Record) and hospice provider files.<h4>Principal findings</h4>Hospice enrollment results in substantial savings in government expenditures (22 percent) among all short-stay (< or =90 days) dying NH residents. For long-stay (>90 days) dying NH residents, hospice provides some savings (8 percent) among cancer residents while it is cost-neutral among dementia residents and adds some cost (10 percent) for residents with a diagnosis other than cancer or dementia. There is evidence of selection bias, particularly among residents with cancer (19 percent savings unadjusted versus 8 percent adjusted). Among short-stay NH residents, hospice greatly reduces Medicare expenditures but increases Medicaid expenditures.<h4>Conclusions</h4>Hospice enrollment results in lower combined Medicare/Medicaid expenditures in the last month of life, particularly among short-stay NH residents. This effect, however, varies by diagnosis and NH length of stay. In addition, for short-stay NH residents, current payment policy creates a Medicare incentive and Medicaid disincentive for promoting residents' referral to hospice.
Project description:Missing data are a pervasive problem in health investigations. We describe some background of missing data analysis and criticize ad hoc methods that are prone to serious problems. We then focus on multiple imputation, in which missing cases are first filled in by several sets of plausible values to create multiple completed datasets, then standard complete-data procedures are applied to each completed dataset, and finally the multiple sets of results are combined to yield a single inference. We introduce the basic concepts and general methodology and provide some guidance for application. For illustration, we use a study assessing the effect of cardiovascular diseases on hospice discussion for late stage lung cancer patients.
Project description:<h4>Objective</h4>To identify the most appropriate imputation method for missing data in the HCUP State Inpatient Databases (SID) and assess the impact of different missing data methods on racial disparities research.<h4>Data sources/study setting</h4>HCUP SID.<h4>Study design</h4>A novel simulation study compared four imputation methods (random draw, hot deck, joint multiple imputation [MI], conditional MI) for missing values for multiple variables, including race, gender, admission source, median household income, and total charges. The simulation was built on real data from the SID to retain their hierarchical data structures and missing data patterns. Additional predictive information from the U.S. Census and American Hospital Association (AHA) database was incorporated into the imputation.<h4>Principal findings</h4>Conditional MI prediction was equivalent or superior to the best performing alternatives for all missing data structures and substantially outperformed each of the alternatives in various scenarios.<h4>Conclusions</h4>Conditional MI substantially improved statistical inferences for racial health disparities research with the SID.
Project description:Missing data are a common issue in cost-effectiveness analysis (CEA) alongside randomised trials and are often addressed assuming the data are 'missing at random'. However, this assumption is often questionable, and sensitivity analyses are required to assess the implications of departures from missing at random. Reference-based multiple imputation provides an attractive approach for conducting such sensitivity analyses, because missing data assumptions are framed in an intuitive way by making reference to other trial arms. For example, a plausible not at random mechanism in a placebo-controlled trial would be to assume that participants in the experimental arm who dropped out stop taking their treatment and have similar outcomes to those in the placebo arm. Drawing on the increasing use of this approach in other areas, this paper aims to extend and illustrate the reference-based multiple imputation approach in CEA. It introduces the principles of reference-based imputation and proposes an extension to the CEA context. The method is illustrated in the CEA of the CoBalT trial evaluating cognitive behavioural therapy for treatment-resistant depression. Stata code is provided. We find that reference-based multiple imputation provides a relevant and accessible framework for assessing the robustness of CEA conclusions to different missing data assumptions.
Project description:<h4>Background</h4>Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias, its use in conjunction with multiple imputation may unlock the potential value of missingness to reduce bias in causal effect estimation, particularly in missing not at random scenarios and where missingness might be associated with unmeasured confounders.<h4>Methods</h4>We conducted a simulation study to determine when the use of a missing indicator, combined with multiple imputation, would reduce bias for causal effect estimation, under a range of scenarios including unmeasured variables, missing not at random, and missing at random mechanisms. We use directed acyclic graphs and structural models to elucidate a variety of causal structures of interest. We handled missing data using complete case analysis, and multiple imputation with and without missing indicator terms.<h4>Results</h4>We find that multiple imputation combined with a missing indicator gives minimal bias for causal effect estimation in most scenarios. In particular the approach: 1) does not introduce bias in missing (completely) at random scenarios; 2) reduces bias in missing not at random scenarios where the missing mechanism depends on the missing variable itself; and 3) may reduce or increase bias when unmeasured confounding is present.<h4>Conclusion</h4>In the presence of missing data, careful use of missing indicators, combined with multiple imputation, can improve causal effect estimation when missingness is informative, and is not detrimental when missingness is at random.
Project description:Multiple imputation (MI) has become popular for analyses with missing data in medical research. The standard implementation of MI is based on the assumption of data being missing at random (MAR). However, for missing data generated by missing not at random mechanisms, MI performed assuming MAR might not be satisfactory. For an incomplete variable in a given data set, its corresponding population marginal distribution might also be available in an external data source. We show how this information can be readily utilised in the imputation model to calibrate inference to the population by incorporating an appropriately calculated offset termed the "calibrated-? adjustment." We describe the derivation of this offset from the population distribution of the incomplete variable and show how, in applications, it can be used to closely (and often exactly) match the post-imputation distribution to the population level. Through analytic and simulation studies, we show that our proposed calibrated-? adjustment MI method can give the same inference as standard MI when data are MAR, and can produce more accurate inference under two general missing not at random missingness mechanisms. The method is used to impute missing ethnicity data in a type 2 diabetes prevalence case study using UK primary care electronic health records, where it results in scientifically relevant changes in inference for non-White ethnic groups compared with standard MI. Calibrated-? adjustment MI represents a pragmatic approach for utilising available population-level information in a sensitivity analysis to explore potential departures from the MAR assumption.
Project description:Obesity complicates medical, nursing, and informal care in severe illness, but its effect on hospice use and Medicare expenditures is unknown.To describe the associations between body mass index (BMI) and hospice use and Medicare expenditures in the last 6 months of life.Retrospective cohort.The HRS (Health and Retirement Study).5677 community-dwelling Medicare fee-for-service beneficiaries who died between 1998 and 2012.Hospice enrollment, days enrolled in hospice, in-home death, and total Medicare expenditures in the 6 months before death. Body mass index was modeled as a continuous variable with a quadratic functional form.For decedents with BMI of 20 kg/m2, the predicted probability of hospice enrollment was 38.3% (95% CI, 36.5% to 40.2%), hospice duration was 42.8 days (CI, 42.3 to 43.2 days), probability of in-home death was 61.3% (CI, 59.4% to 63.2%), and total Medicare expenditures were $42 803 (CI, $41 085 to $44 521). When BMI increased to 30 kg/m2, the predicted probability of hospice enrollment decreased by 6.7 percentage points (CI, -9.3 to -4.0 percentage points), hospice duration decreased by 3.8 days (CI, -4.4 to -3.1 days), probability of in-home death decreased by 3.2 percentage points (CI, -6.0 to -0.4 percentage points), and total Medicare expenditures increased by $3471 (CI, $955 to $5988). For morbidly obese decedents (BMI ?40 kg/m2), the predicted probability of hospice enrollment decreased by 15.2 percentage points (CI, -19.6 to -10.9 percentage points), hospice duration decreased by 4.3 days (CI, -5.7 to -2.9 days), and in-home death decreased by 6.3 percentage points (CI, -11.2 to -1.5 percentage points) versus decedents with BMI of 20 kg/m2.Baseline data were self-reported, and the interval between reported BMI and time of death varied.Among community-dwelling decedents in the HRS, increasing obesity was associated with reduced hospice use and in-home death and higher Medicare expenditures in the last 6 months of life.Robert Wood Johnson Foundation Clinical Scholars Program.