The effects of spatial population dataset choice on estimates of population at risk of disease.
ABSTRACT: The spatial modeling of infectious disease distributions and dynamics is increasingly being undertaken for health services planning and disease control monitoring, implementation, and evaluation. Where risks are heterogeneous in space or dependent on person-to-person transmission, spatial data on human population distributions are required to estimate infectious disease risks, burdens, and dynamics. Several different modeled human population distribution datasets are available and widely used, but the disparities among them and the implications for enumerating disease burdens and populations at risk have not been considered systematically. Here, we quantify some of these effects using global estimates of populations at risk (PAR) of P. falciparum malaria as an example.The recent construction of a global map of P. falciparum malaria endemicity enabled the testing of different gridded population datasets for providing estimates of PAR by endemicity class. The estimated population numbers within each class were calculated for each country using four different global gridded human population datasets: GRUMP (~1 km spatial resolution), LandScan (~1 km), UNEP Global Population Databases (~5 km), and GPW3 (~5 km). More detailed assessments of PAR variation and accuracy were conducted for three African countries where census data were available at a higher administrative-unit level than used by any of the four gridded population datasets.The estimates of PAR based on the datasets varied by more than 10 million people for some countries, even accounting for the fact that estimates of population totals made by different agencies are used to correct national totals in these datasets and can vary by more than 5% for many low-income countries. In many cases, these variations in PAR estimates comprised more than 10% of the total national population. The detailed country-level assessments suggested that none of the datasets was consistently more accurate than the others in estimating PAR. The sizes of such differences among modeled human populations were related to variations in the methods, input resolution, and date of the census data underlying each dataset. Data quality varied from country to country within the spatial population datasets.Detailed, highly spatially resolved human population data are an essential resource for planning health service delivery for disease control, for the spatial modeling of epidemics, and for decision-making processes related to public health. However, our results highlight that for the low-income regions of the world where disease burden is greatest, existing datasets display substantial variations in estimated population distributions, resulting in uncertainty in disease assessments that utilize them. Increased efforts are required to gather contemporary and spatially detailed demographic data to reduce this uncertainty, particularly in Africa, and to develop population distribution modeling methods that match the rigor, sophistication, and ability to handle uncertainty of contemporary disease mapping and spread modeling. In the meantime, studies that utilize a particular spatial population dataset need to acknowledge the uncertainties inherent within them and consider how the methods and data that comprise each will affect conclusions.
Project description:Understanding the fine scale spatial distribution of births and pregnancies is crucial for informing planning decisions related to public health. This is especially important in lower income countries where infectious disease is a major concern for pregnant women and new-borns, as highlighted by the recent Zika virus epidemic. Despite this, the spatial detail of basic data on the numbers and distribution of births and pregnancies is often of a coarse resolution and difficult to obtain, with no co-ordination between countries and organisations to create one consistent set of subnational estimates. To begin to address this issue, under the framework of the WorldPop program, an open access archive of high resolution gridded birth and pregnancy distribution datasets for all African, Latin America and Caribbean countries has been created. Datasets were produced using the most recent and finest level census and official population estimate data available and are at a resolution of 30 arc seconds (approximately 1?km at the equator). All products are available through WorldPop.
Project description:Recent years have seen substantial growth in openly available satellite and other geospatial data layers, which represent a range of metrics relevant to global human population mapping at fine spatial scales. The specifications of such data differ widely and therefore the harmonisation of data layers is a prerequisite to constructing detailed and contemporary spatial datasets which accurately describe population distributions. Such datasets are vital to measure impacts of population growth, monitor change, and plan interventions. To this end the WorldPop Project has produced an open access archive of 3 and 30 arc-second resolution gridded data. Four tiled raster datasets form the basis of the archive: (i) Viewfinder Panoramas topography clipped to Global ADMinistrative area (GADM) coastlines; (ii) a matching ISO 3166 country identification grid; (iii) country area; (iv) and slope layer. Further layers include transport networks, landcover, nightlights, precipitation, travel time to major cities, and waterways. Datasets and production methodology are here described. The archive can be downloaded both from the WorldPop Dataverse Repository and the WorldPop Project website.
Project description:With sea level predicted to rise and the frequency and intensity of coastal flooding expected to increase due to climate change, high-resolution gridded population datasets have been extensively used to estimate the size of vulnerable populations in low-elevation coastal zones (LECZ). China is the most populous country, and populations in its LECZ grew rapidly due to urbanization and remarkable economic growth in coastal areas. In assessing the potential impacts of coastal hazards, the spatial distribution of population exposure in China's LECZ should be examined. In this study, we propose a combination of multisource remote sensing images, point-of-interest data, and machine learning methods to improve the performance of population disaggregation in coastal China. The resulting population grid map of coastal China for the reference year 2010, with a spatial resolution of 100 × 100 m, is presented and validated. Then, we analyze the distribution of population in LECZ by overlaying the new gridded population data and LECZ footprints. Results showed that the total population exposed in China's LECZ in 2010 was 158.2 million (random forest prediction) and 160.6 million (Cubist prediction), which account for 12.17% and 12.36% of the national population, respectively. This study also showed the considerable potential in combining geospatial big data for high-resolution population estimation.
Project description:INTRODUCTION:In low- and middle-income countries (LMICs), household survey data are a main source of information for planning, evaluation, and decision-making. Standard surveys are based on censuses, however, for many LMICs it has been more than 10 years since their last census and they face high urban growth rates. Over the last decade, survey designers have begun to use modelled gridded population estimates as sample frames. We summarize the state of the emerging field of gridded population survey sampling, focussing on LMICs. METHODS:We performed a systematic scoping review in Scopus of specific gridded population datasets and "population" or "household" "survey" reports, and solicited additional published and unpublished sources from colleagues. RESULTS:We identified 43 national and sub-national gridded population-based household surveys implemented across 29 LMICs. Gridded population surveys used automated and manual approaches to derive clusters from WorldPop and LandScan gridded population estimates. After sampling, some survey teams interviewed all households in each cluster or segment, and others sampled households from larger clusters. Tools to select gridded population survey clusters include the GridSample R package, Geo-sampling tool, and GridSample.org. In the field, gridded population surveys generally relied on geographically accurate maps based on satellite imagery or OpenStreetMap, and a tablet or GPS technology for navigation. CONCLUSIONS:For gridded population survey sampling to be adopted more widely, several strategic questions need answering regarding cell-level accuracy and uncertainty of gridded population estimates, the methods used to group/split cells into sample frame units, design effects of new sample designs, and feasibility of tools and methods to implement surveys across diverse settings.
Project description:The WHO has established the disability-adjusted life year (DALY) as a metric for measuring the burden of human disease and injury globally. However, most DALY estimates have been calculated as national totals. We mapped spatial variation in the burden of human African trypanosomiasis (HAT) in Uganda for the years 2000-2009. This represents the first geographically delimited estimation of HAT disease burden at the sub-country scale.Disability-adjusted life-year (DALY) totals for HAT were estimated based on modelled age and mortality distributions, mapped using Geographic Information Systems (GIS) software, and summarised by parish and district. While the national total burden of HAT is low relative to other conditions, high-impact districts in Uganda had DALY rates comparable to the national burden rates for major infectious diseases. The calculated average national DALY rate for 2000-2009 was 486.3 DALYs/100 000 persons/year, whereas three districts afflicted by rhodesiense HAT in southeastern Uganda had burden rates above 5000 DALYs/100 000 persons/year, comparable to national GBD 2004 average burden rates for malaria and HIV/AIDS.These results provide updated and improved estimates of HAT burden across Uganda, taking into account sensitivity to under-reporting. Our results highlight the critical importance of spatial scale in disease burden analyses. National aggregations of disease burden have resulted in an implied bias against highly focal diseases for which geographically targeted interventions may be feasible and cost-effective. This has significant implications for the use of DALY estimates to prioritize disease interventions and inform cost-benefit analyses.
Project description:Gridded population distribution data are finding increasing use in a wide range of fields, including resource allocation, disease burden estimation and climate change impact assessment. Land cover information can be used in combination with detailed settlement extents to redistribute aggregated census counts to improve the accuracy of national-scale gridded population data. In East Africa, such analyses have been done using regional land cover data, thus restricting application of the approach to this region. If gridded population data are to be improved across Africa, an alternative, consistent and comparable source of land cover data is required. Here these analyses were repeated for Kenya using four continent-wide land cover datasets combined with detailed settlement extents and accuracies were assessed against detailed census data. The aim was to identify the large area land cover dataset that, combined with detailed settlement extents, produce the most accurate population distribution data. The effectiveness of the population distribution modelling procedures in the absence of high resolution census data was evaluated, as was the extrapolation ability of population densities between different regions. Results showed that the use of the GlobCover dataset refined with detailed settlement extents provided significantly more accurate gridded population data compared to the use of refined AVHRR-derived, MODIS-derived and GLC2000 land cover datasets. This study supports the hypothesis that land cover information is important for improving population distribution model accuracies, particularly in countries where only coarse resolution census data are available. Obtaining high resolution census data must however remain the priority. With its higher spatial resolution and its more recent data acquisition, the GlobCover dataset was found as the most valuable resource to use in combination with detailed settlement extents for the production of gridded population datasets across large areas.
Project description:A research priority for Plasmodium vivax malaria is to improve our understanding of the spatial distribution of risk and its relationship with the burden of P. vivax disease in human populations. The aim of the research outlined in this article is to provide a contemporary evidence-based map of the global spatial extent of P. vivax malaria, together with estimates of the human population at risk (PAR) of any level of transmission in 2009.The most recent P. vivax case-reporting data that could be obtained for all malaria endemic countries were used to classify risk into three classes: malaria free, unstable (<0.1 case per 1,000 people per annum (p.a.)) and stable (> or =0.1 case per 1,000 p.a.) P. vivax malaria transmission. Risk areas were further constrained using temperature and aridity data based upon their relationship with parasite and vector bionomics. Medical intelligence was used to refine the spatial extent of risk in specific areas where transmission was reported to be absent (e.g., large urban areas and malaria-free islands). The PAR under each level of transmission was then derived by combining the categorical risk map with a high resolution population surface adjusted to 2009. The exclusion of large Duffy negative populations in Africa from the PAR totals was achieved using independent modelling of the gene frequency of this genetic trait. It was estimated that 2.85 billion people were exposed to some risk of P. vivax transmission in 2009, with 57.1% of them living in areas of unstable transmission. The vast majority (2.59 billion, 91.0%) were located in Central and South East (CSE) Asia, whilst the remainder were located in America (0.16 billion, 5.5%) and in the Africa+ region (0.10 billion, 3.5%). Despite evidence of ubiquitous risk of P. vivax infection in Africa, the very high prevalence of Duffy negativity throughout Central and West Africa reduced the PAR estimates substantially.After more than a century of development and control, P. vivax remains more widely distributed than P. falciparum and is a potential cause of morbidity and mortality amongst the 2.85 billion people living at risk of infection, the majority of whom are in the tropical belt of CSE Asia. The probability of infection is reduced massively across Africa by the frequency of the Duffy negative trait, but transmission does occur on the continent and is a concern for Duffy positive locals and travellers. The final map provides the spatial limits on which the endemicity of P. vivax transmission can be mapped to support future cartographic-based burden estimations.
Project description:For its fifth assessment report, the Intergovernmental Panel on Climate Change divided future scenario projections (2005-2100) into two groups: Socio-Economic Pathways (SSPs) and Representative Concentration Pathways (RCPs). Each SSP has country-level urban and rural population projections, while the RCPs are based on radiative forcing caused by greenhouse gases, aerosols and associated land-use change. In order for these projections to be applicable in earth system models, SSP and RCP population projections must be at the same spatial scale. Thus, a gridded population dataset that takes into account both RCP-based urban fractions and SSP-based population projection is needed. To support this need, an annual (2000-2100) high resolution (approximately 1km at the equator) gridded population dataset conforming to both RCPs (urban land use) and SSPs (population) country level scenario data were created.
Project description:Where do people live, and how has this changed over timescales of centuries? High-resolution spatial information on historical human population distribution is of great significance to understand human-environment interactions and their temporal dynamics. However, the complex relationship between population distribution and various influencing factors coupled with limited data availability make it a challenge to reconstruct human population distribution over timescales of centuries. This study generated 1-km decadal population maps for the conterminous US from 1790 to 2010 using parsimonious models based on natural suitability, socioeconomic desirability, and inhabitability. Five models of increasing complexity were evaluated. The models were validated with census tract and county subdivision population data in 2000 and were applied to generate five sets of 22 historical population maps from 1790-2010. Separating urban and rural areas and excluding non-inhabitable areas were the most important factors for improving the overall accuracy. The generated gridded population datasets and the production and validation methods are described here.
Project description:Improved understanding of geographical variation and inequity in health status, wealth and access to resources within countries is increasingly being recognized as central to meeting development goals. Development and health indicators assessed at national or subnational scale can often conceal important inequities, with the rural poor often least well represented. The ability to target limited resources is fundamental, especially in an international context where funding for health and development comes under pressure. This has recently prompted the exploration of the potential of spatial interpolation methods based on geolocated clusters from national household survey data for the high-resolution mapping of features such as population age structures, vaccination coverage and access to sanitation. It remains unclear, however, how predictable these different factors are across different settings, variables and between demographic groups. Here we test the accuracy of spatial interpolation methods in producing gender-disaggregated high-resolution maps of the rates of literacy, stunting and the use of modern contraceptive methods from a combination of geolocated demographic and health surveys cluster data and geospatial covariates. Bayesian geostatistical and machine learning modelling methods were tested across four low-income countries and varying gridded environmental and socio-economic covariate datasets to build 1×1 km spatial resolution maps with uncertainty estimates. Results show the potential of the approach in producing high-resolution maps of key gender-disaggregated socio-economic indicators, with explained variance through cross-validation being as high as 74-75% for female literacy in Nigeria and Kenya, and in the 50-70% range for many other variables. However, substantial variations by both country and variable were seen, with many variables showing poor mapping accuracies in the range of 2-30% explained variance using both geostatistical and machine learning approaches. The analyses offer a robust basis for the construction of timely maps with levels of detail that support geographically stratified decision-making and the monitoring of progress towards development goals. However, the great variability in results between countries and variables highlights the challenges in applying these interpolation methods universally across multiple countries, and the importance of validation and quantifying uncertainty if this is undertaken.