Concentrations of criteria pollutants in the contiguous U.S., 1979 – 2015: Role of prediction model parsimony in integrated empirical geographic regression

dc.contributor.authorKim, Sun-Youngen
dc.contributor.authorBechle, Matthew J.en
dc.contributor.authorHankey, Steven C.en
dc.contributor.authorSheppard, Lianneen
dc.contributor.authorSzpiro, Adam A.en
dc.contributor.authorMarshall, Julian D.en
dc.coverage.countryUnited Statesen
dc.date.accessioned2021-10-04T19:22:50Zen
dc.date.available2021-10-04T19:22:50Zen
dc.date.issued2020-02-01en
dc.date.updated2021-10-04T19:22:46Zen
dc.description.abstractNational-scale empirical models for air pollution can include hundreds of geographic variables. The impact of model parsimony (i.e., how model performance differs for a large versus small number of covariates) has not been systematically explored. We aim to (1) build annual-average integrated empirical geographic (IEG) regression models for the contiguous U.S. for six criteria pollutants during 1979–2015; (2) explore systematically the impact on model performance of the number of variables selected for inclusion in a model; and (3) provide publicly available model predictions. We compute annual-average concentrations from regulatory monitoring data for PM10, PM2.5, NO2, SO2, CO, and ozone at all monitoring sites for 1979–2015. We also use ~350 geographic characteristics at each location including measures of traffic, land use, land cover, and satellite-based estimates of air pollution. We then develop IEG models, employing universal kriging and summary factors estimated by partial least squares (PLS) of geographic variables. For all pollutants and years, we compare three approaches for choosing variables to include in the PLS model: (1) no variables, (2) a limited number of variables selected from the full set by forward selection, and (3) all variables. We evaluate model performance using 10-fold cross-validation (CV) using conventional and spatially-clustered test data. Models using 3 to 30 variables selected from the full set generally have the best performance across all pollutants and years (median R2 conventional [clustered] CV: 0.66 [0.47]) compared to models with no (0.37 [0]) or all variables (0.64 [0.27]). Concentration estimates for all Census Blocks reveal generally decreasing concentrations over several decades with local heterogeneity. Our findings suggest that national prediction models can be built by empirically selecting only a small number of important variables to provide robust concentration estimates. Model estimates are freely available online.en
dc.description.versionPublished versionen
dc.format.extentPages e0228535en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1371/journal.pone.0228535en
dc.identifier.eissn1932-6203en
dc.identifier.issn1932-6203en
dc.identifier.issue2en
dc.identifier.otherPONE-D-19-06161 (PII)en
dc.identifier.pmid32069301en
dc.identifier.urihttp://hdl.handle.net/10919/105161en
dc.identifier.volume15en
dc.language.isoenen
dc.publisherPLOSen
dc.relation.urihttps://www.ncbi.nlm.nih.gov/pubmed/32069301en
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subject.meshHumansen
dc.subject.meshCarbon Monoxideen
dc.subject.meshNitrogen Dioxideen
dc.subject.meshOzoneen
dc.subject.meshSulfur Dioxideen
dc.subject.meshAir Pollutantsen
dc.subject.meshModels, Statisticalen
dc.subject.meshRegression Analysisen
dc.subject.meshAir Pollutionen
dc.subject.meshEnvironmental Exposureen
dc.subject.meshEnvironmental Monitoringen
dc.subject.meshGeographyen
dc.subject.meshTime Factorsen
dc.subject.meshHistory, 20th Centuryen
dc.subject.meshHistory, 21st Centuryen
dc.subject.meshUnited Statesen
dc.subject.meshParticulate Matteren
dc.subject.meshSpatial Analysisen
dc.titleConcentrations of criteria pollutants in the contiguous U.S., 1979 – 2015: Role of prediction model parsimony in integrated empirical geographic regressionen
dc.title.serialPLoS ONEen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.otherJournal Articleen
dcterms.dateAccepted2020-01-17en
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Architecture and Urban Studiesen
pubs.organisational-group/Virginia Tech/Architecture and Urban Studies/School of Public and International Affairsen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Architecture and Urban Studies/CAUS T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Concentrations of criteria pollutants in the contiguous U.S., 1979 - 2015 Role of prediction model parsimony in integrated e.pdf
Size:
2.87 MB
Format:
Adobe Portable Document Format
Description:
Published version