Dataset Information

Assessing the health estimation capacity of air pollution exposure prediction models.

ABSTRACT:

Background

The era of big data has enabled sophisticated models to predict air pollution concentrations over space and time. Historically these models have been evaluated using overall metrics that measure how close predictions are to monitoring data. However, overall methods are not designed to distinguish error at timescales most relevant for epidemiologic studies, such as day-to-day errors that impact studies of short-term health associations.

Methods

We introduce frequency band model performance, which quantifies health estimation capacity of air quality prediction models for time series studies of air pollution and health. Frequency band model performance uses a discrete Fourier transform to evaluate prediction models at timescales of interest. We simulated fine particulate matter (PM_2.5), with errors at timescales varying from acute to seasonal, and health time series data. To compare evaluation approaches, we use correlations and root mean squared error (RMSE). Additionally, we assess health estimation capacity through bias and RMSE in estimated health associations. We apply frequency band model performance to PM_2.5 predictions at 17 monitors in 8 US cities.

Results

In simulations, frequency band model performance rates predictions better (lower RMSE, higher correlation) when there is no error at a particular timescale (e.g., acute) and worse when error is added to that timescale, compared to overall approaches. Further, frequency band model performance is more strongly associated (R² = 0.95) with health association bias compared to overall approaches (R² = 0.57). For PM_2.5 predictions in Salt Lake City, UT, frequency band model performance better identifies acute error that may impact estimated short-term health associations.

Conclusions

For epidemiologic studies, frequency band model performance provides an improvement over existing approaches because it evaluates models at the timescale of interest and is more strongly associated with bias in estimated health associations. Evaluating prediction models at timescales relevant for health studies is critical to determining whether model error will impact estimated health associations.

SUBMITTER: Krall JR

PROVIDER: S-EPMC8928613 | biostudies-literature | 2022 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Assessing the health estimation capacity of air pollution exposure prediction models.

Krall Jenna R JR Keller Joshua P JP Peng Roger D RD

Environmental health : a global access science source 20220317 1

<h4>Background</h4>The era of big data has enabled sophisticated models to predict air pollution concentrations over space and time. Historically these models have been evaluated using overall metrics that measure how close predictions are to monitoring data. However, overall methods are not designed to distinguish error at timescales most relevant for epidemiologic studies, such as day-to-day errors that impact studies of short-term health associations.<h4>Methods</h4>We introduce frequency ban ...[more]

PMID: 35300698

Similar Datasets

Project description:BackgroundPopulation exposure assessment methods that capture local-scale pollutant variability are needed for large-scale epidemiological studies and surveillance, policy, and regulatory purposes. Currently, such exposure methods are limited.MethodsWe created 2006 national pollutant models for fine particulate matter [PM with aerodynamic diameter ? 2.5 ?m (PM2.5)], nitrogen dioxide (NO2), benzene, ethylbenzene, and 1,3-butadiene from routinely collected fixed-site monitoring data in Canada. In multiple regression models, we incorporated satellite estimates and geographic predictor variables to capture background and regional pollutant variation and used deterministic gradients to capture local-scale variation. The national NO2 and benzene models are evaluated with independent measurements from previous land use regression models that were conducted in seven Canadian cities. National models are applied to census block-face points, each of which represents the location of approximately 89 individuals, to produce estimates of population exposure.ResultsThe national NO2 model explained 73% of the variability in fixed-site monitor concentrations, PM2.5 46%, benzene 62%, ethylbenzene 67%, and 1,3-butadiene 68%. The NO2 model predicted, on average, 43% of the within-city variability in the independent NO2 data compared with 18% when using inverse distance weighting of fixed-site monitoring data. Benzene models performed poorly in predicting within-city benzene variability. Based on our national models, we estimated Canadian ambient annual average population-weighted exposures (in micrograms per cubic meter) of 8.39 for PM2.5, 23.37 for NO2, 1.04 for benzene, 0.63 for ethylbenzene, and 0.09 for 1,3-butadiene.ConclusionsThe national pollutant models created here improve exposure assessment compared with traditional monitor-based approaches by capturing both regional and local-scale pollution variation. Applying national models to routinely collected population location data can extend land use modeling techniques to population exposure assessment and to informing surveillance, policy, and regulation.

Project description:Recent cohort studies have relied on exposure prediction models to estimate individuallevel air pollution concentrations because individual air pollution measurements are not available for cohort locations. For such prediction models, geographic variables related to pollution sources are important inputs. We demonstrated the computation process of geographic variables mostly recorded in 2010 at regulatory air pollution monitoring sites in South Korea. On the basis of previous studies, we finalized a list of 313 geographic variables related to air pollution sources in eight categories including traffic, demographic characteristics, land use, transportation facilities, physical geography, emissions, vegetation, and altitude. We then obtained data from different sources such as the Statistics Geographic Information Service and Korean Transport Database. After integrating all available data to a single database by matching coordinate systems and converting non-spatial data to spatial data, we computed geographic variables at 294 regulatory monitoring sites in South Korea. The data integration and variable computation were performed by using ArcGIS version 10.2 (ESRI Inc., Redlands, CA, USA). For traffic, we computed the distances to the nearest roads and the sums of road lengths within different sizes of circular buffers. In addition, we calculated the numbers of residents, households, housing buildings, companies, and employees within the buffers. The percentages of areas for different types of land use compared to total areas were calculated within the buffers. For transportation facilities and physical geography, we computed the distances to the closest public transportation depots and the boundary lines. The vegetation index and altitude were estimated at a given location by using satellite data. The summary statistics of geographic variables in Seoul across monitoring sites showed different patterns between urban background and urban roadside sites. This study provided practical knowledge on the computation process of geographic variables in South Korea, which will improve air pollution prediction models and contribute to subsequent health analyses.

Dataset Information

Assessing the health estimation capacity of air pollution exposure prediction models.

Background

Methods

Results

Conclusions

Publications

Assessing the health estimation capacity of air pollution exposure prediction models.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets