Browsing by Author "Bechle, Matthew J."
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Concentrations of criteria pollutants in the contiguous U.S., 1979 – 2015: Role of prediction model parsimony in integrated empirical geographic regressionKim, Sun-Young; Bechle, Matthew J.; Hankey, Steven C.; Sheppard, Lianne; Szpiro, Adam A.; Marshall, Julian D. (PLOS, 2020-02-01)National-scale empirical models for air pollution can include hundreds of geographic variables. The impact of model parsimony (i.e., how model performance differs for a large versus small number of covariates) has not been systematically explored. We aim to (1) build annual-average integrated empirical geographic (IEG) regression models for the contiguous U.S. for six criteria pollutants during 1979–2015; (2) explore systematically the impact on model performance of the number of variables selected for inclusion in a model; and (3) provide publicly available model predictions. We compute annual-average concentrations from regulatory monitoring data for PM10, PM2.5, NO2, SO2, CO, and ozone at all monitoring sites for 1979–2015. We also use ~350 geographic characteristics at each location including measures of traffic, land use, land cover, and satellite-based estimates of air pollution. We then develop IEG models, employing universal kriging and summary factors estimated by partial least squares (PLS) of geographic variables. For all pollutants and years, we compare three approaches for choosing variables to include in the PLS model: (1) no variables, (2) a limited number of variables selected from the full set by forward selection, and (3) all variables. We evaluate model performance using 10-fold cross-validation (CV) using conventional and spatially-clustered test data. Models using 3 to 30 variables selected from the full set generally have the best performance across all pollutants and years (median R2 conventional [clustered] CV: 0.66 [0.47]) compared to models with no (0.37 [0]) or all variables (0.64 [0.27]). Concentration estimates for all Census Blocks reveal generally decreasing concentrations over several decades with local heterogeneity. Our findings suggest that national prediction models can be built by empirically selecting only a small number of important variables to provide robust concentration estimates. Model estimates are freely available online.
- Land Use Regression models for 60 volatile organic compounds: Comparing Google Point of Interest (POI) and city permit dataLu, Tianjun; Lansing, Jennifer; Zhang, Wenwen; Bechle, Matthew J.; Hankey, Steven C. (2019-08-10)Land Use Regression (LUR) models of Volatile Organic Compounds (VOC) normally focus on land use (e.g., industrial area) or transportation facilities (e.g., roadway); here, we incorporate area sources (e.g., gas stations) from city permitting data and Google Point of Interest (POI) data to compare model performance. We used measurements from 50 community-based sampling locations (2013-2015) in Minneapolis, MN, USA to develop LUR models for 60 VOCs. We used three sets of independent variables: (1) base-case models with land use and transportation variables, (2) models that add area source variables from local business permit data, and (3) models that use Google POI data for area sources. The models with Google POI data performed best; for example, the total VOC (TVOC) model has better goodness-of-fit (adj-R-2: 0.56; Root Mean Square Error [RMSE]: 032 mu g/m(3)) as compared to the permit data model (0.42; 037) and the base-case model (0.26; 0.41). Area source variables were selected in over two thirds of models among the 60 VOCs at small-scale buffer sizes (e.g., 25 m-500 m). Our work suggests that VOC LUR models can be developed using community-based sampling and that models improve by including area sources as measured by business permit and Google POI data. (C) 2019 The Authors. Published by Elsevier B.V.