Project description:Unit level model is one of the classical models in small area estimation, which plays an important role with unit information data. Empirical Bayesian(EB) estimation, as the optimal estimation under normal assumption, is the most commonly used parameter estimation method in unit level model. However, this kind of method is sensitive to outliers, and EB estimation will lead to considerable inflation of the mean square error(MSE) when there are outliers in the responses yij. In this study, we propose a robust estimation method for the unit-level model with outliers based on the minimum density power divergence. Firstly, by introducing the minimum density power divergence function, we give the estimation equation of the parameters of the unit level model, and obtain the asymptotic distribution of the robust parameters. Considering the existence of tuning parameters in the robust estimator, an optimal parameter selection algorithm is proposed. Secondly, empirical Bayesian predictors of unit and area mean in finite populations are given, and the MSE of the proposed robust estimators of small area means is given by bootstrap method. Finally, we verify the superior performance of our proposed method through simulation data and real data. Through comparison, our proposed method can can solve the outlier situation better.
Project description:Many people living in low and middle-income countries are not covered by civil registration and vital statistics systems. Consequently, a wide variety of other types of data including many household sample surveys are used to estimate health and population indicators. In this paper we combine data from sample surveys and demographic surveillance systems to produce small area estimates of child mortality through time. Small area estimates are necessary to understand geographical heterogeneity in health indicators when full-coverage vital statistics are not available. For this endeavor spatio-temporal smoothing is beneficial to alleviate problems of data sparsity. The use of conventional hierarchical models requires careful thought since the survey weights may need to be considered to alleviate bias due to non-random sampling and non-response. The application that motivated this work is estimation of child mortality rates in five-year time intervals in regions of Tanzania. Data come from Demographic and Health Surveys conducted over the period 1991-2010 and two demographic surveillance system sites. We derive a variance estimator of under five years child mortality that accounts for the complex survey weighting. For our application, the hierarchical models we consider include random effects for area, time and survey and we compare models using a variety of measures including the conditional predictive ordinate (CPO). The method we propose is implemented via the fast and accurate integrated nested Laplace approximation (INLA).
Project description:IntroductionSmall-area estimation methods are an alternative to direct survey-based estimates in cases where a survey's sample size does not suffice to ensure representativeness. Nevertheless, the information yielded by small-area estimation methods must be validated. The objective of this study was thus to validate a small-area model.MethodsThe prevalence of smokers, ex-smokers, and never smokers by sex and age group (15-34, 35-54, 55-64, 65-74, ≥75 years) was calculated in two Spanish Autonomous Regions (ARs) by applying a weighted ratio estimator (direct estimator) to data from representative surveys. These estimates were compared against those obtained with a small-area model applied to another survey, specifically the Spanish National Health Survey, which did not guarantee representativeness for these two ARs by sex and age. To evaluate the concordance of the estimates, we calculated the intraclass correlation coefficient (ICC) and the 95% confidence intervals of the differences between estimates. To assess the precision of the estimates, the coefficients of variation were obtained.ResultsIn all cases, the ICC was ≥0.87, indicating good concordance between the direct and small-area model estimates. Slightly more than eight in ten 95% confidence intervals for the differences between estimates included zero. In all cases, the coefficient of variation of the small-area model was <30%, indicating a good degree of precision in the estimates.ConclusionsThe small-area model applied to national survey data yields valid estimates of smoking prevalence by sex and age group at the AR level. These models could thus be applied to a single year's data from a national survey, which does not guarantee regional representativeness, to characterize various risk factors in a population at a subnational level.
Project description:Forest inventories require estimates and measures of uncertainty for subpopulations such as management units. These units often times hold a small sample size, so they should be regarded as small areas. When auxiliary information is available, different small area estimation methods have been proposed to obtain reliable estimates for small areas. Unit level empirical best linear unbiased predictors (EBLUP) based on plot or grid unit level models have been studied more thoroughly than area level EBLUPs, where the modelling occurs at the management unit scale. Area level EBLUPs do not require a precise plot positioning and allow the use of variable radius plots, thus reducing fieldwork costs. However, their performance has not been examined thoroughly. We compared unit level and area level EBLUPs, using LiDAR auxiliary information collected for inventorying 98,104 ha coastal coniferous forest. Unit level models were consistently more accurate than area level EBLUPs, and area level EBLUPs were consistently more accurate than field estimates except for large management units that held a large sample. For stand density, volume, basal area, quadratic mean diameter, mean height and Lorey's height, root mean squared errors (rmses) of estimates obtained using area level EBLUPs were, on average, 1.43, 2.83, 2.09, 1.40, 1.32 and 1.64 times larger than those based on unit level estimates, respectively. Similarly, direct field estimates had rmses that were, on average, 1.37, 1.45, 1.17, 1.17, 1.26, and 1.38 times larger than rmses of area level EBLUPs. Therefore, area level models can lead to substantial gains in accuracy compared to direct estimates, and unit level models lead to very important gains in accuracy compared to area level models, potentially justifying the additional costs of obtaining accurate field plot coordinates.
Project description:Under a unit-level bivariate linear mixed model, this paper introduces small area predictors of expenditure means and ratios, and derives approximations and estimators of the corresponding mean squared errors. For the considered model, the REML estimation method is implemented. Several simulation experiments, designed to analyze the behavior of the introduced fitting algorithm, predictors and mean squared error estimators, are carried out. An application to real data from the Spanish household budget survey illustrates the behavior of the proposed statistical methodology. The target is the estimation of means of food and non-food household annual expenditures and of ratios of food household expenditures by Spanish provinces.
Project description:BackgroundLocal governments and other public health entities often need population health measures at the county or subcounty level for activities such as resource allocation and targeting public health interventions, among others. Information collected via national surveys alone cannot fill these needs. We propose a novel, two-step method for rescaling health survey data and creating small area estimates (SAEs) of smoking rates using a Behavioral Risk Factor Surveillance System survey administered in 2015 to participants living in Allegheny County, Pennsylvania, USA.MethodsThe first step consisted of a spatial microsimulation to rescale location of survey respondents from zip codes to tracts based on census population distributions by age, sex, race, and education. The rescaling allowed us, in the second step, to utilize available census tract-specific ancillary data on social vulnerability for small area estimation of local health risk using an area-level version of a logistic linear mixed model. To demonstrate this new two-step algorithm, we estimated the ever-smoking rate for the census tracts of Allegheny County.ResultsThe ever-smoking rate was above 70% for two census tracts to the southeast of the city of Pittsburgh. Several tracts in the southern and eastern sections of Pittsburgh also had relatively high (> 65%) ever-smoking rates.ConclusionsThese SAEs may be used in local public health efforts to target interventions and educational resources aimed at reducing cigarette smoking. Further, our new two-step methodology may be extended to small area estimation for other locations and health outcomes.
Project description:Regular health surveys can produce reliable estimates at higher geographic levels but not for small areas. Alternatives are to aggregate data over several years or use model-based methods. We created and evaluated model-based estimates for four health-related outcomes by gender, for 153 Local Government Areas using data from the New South Wales Population Health Survey. The evaluation examined evidence on bias and determined the covariates available and appropriate for each outcome variable. The evaluation considered the likely precision of the resulting estimates. The bias and precision of results for single years (2006-2008) for each outcome variable using six covariate specifications were compared with direct survey estimates based on a single year's data and those obtained by aggregating over seven years. A practical issue is how to choose covariates to include in the models as the best covariate specification varies between outcome variables. Model-based results had median root mean squared errors between 3.3% and 5.5% (max 5.2% and 11.3% respectively) and median relative root mean squared errors between 6.8% and 24.5% (max 11.7% and 41.5% respectively). The model-based estimates were unbiased compared with direct estimates based on one or seven years of data and when aggregated to a point where direct estimates were reliable. The bias and reliability assessment process provides a way for policymakers to have confidence in model-based estimates.
Project description:Small area estimation (SAE) entails estimating characteristics of interest for domains, often geographical areas, in which there may be few or no samples available. SAE has a long history and a wide variety of methods have been suggested, from a bewildering range of philosophical standpoints. We describe design-based and model-based approaches and models that are specified at the area-level and at the unit-level, focusing on health applications and fully Bayesian spatial models. The use of auxiliary information is a key ingredient for successful inference when response data are sparse and we discuss a number of approaches that allow the inclusion of covariate data. SAE for HIV prevalence, using data collected from a Demographic Health Survey in Malawi in 2015-2016, is used to illustrate a number of techniques. The potential use of SAE techniques for outcomes related to COVID-19 is discussed.
Project description:Creating local population health measures from administrative data would be useful for health policy and public health monitoring purposes. While a wide range of options--from simple spatial smoothers to model-based methods--for estimating such rates exists, there are relatively few side-by-side comparisons, especially not with real-world data. In this paper, we compare methods for creating local estimates of acute myocardial infarction rates from Medicare claims data. A Bayesian Monte Carlo Markov Chain estimator that incorporated spatial and local random effects performed best, followed by a method-of-moments spatial Empirical Bayes estimator. As the former is more complicated and time-consuming, spatial linear Empirical Bayes methods may represent a good alternative for non-specialist investigators.