Using Mobile Monitoring and Vehicle Emissions to Develop and Validate Machine Learning Empirical Models of Particulate Air Pollution

TR Number

Date

2021-08-18

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Increasing levels of air pollution are prompting researchers to develop more reliable air pollution modeling approaches in order to protect the public and the environment from toxic contaminants and airborne pathogens. Although land use regression has long been used to assess exposure to air pollution, researchers are increasingly using machine learning algorithms to quantify the concentration of harmful pollutants—for this study black carbon (BC) and particle number (PN). Additionally, researchers are moving away from using fixed-site data in favor of using mobile monitoring data in a variety of locations to develop hourly empirical models of particulate air pollution.
This study uses secondary data describing BC and PN pollutant levels, which are obtained from roads that bikers share in the more rural location of Blacksburg (VA). Machine learning (ML) algorithms are then built to develop accurate and reliable short-term empirical prediction models. Different pre-processing methods for the mobile monitoring data and various input variables are tested to assess how ML can be used effectively in this process. Three types of time-average models are developed (daytime, hourly average, and one second models). Various combinations of spatial and temporal input variables are used in the short-term models. The impact of adding more spatiotemporal variables (e.g., emissions) to machine learning models to improve model performance is assessed in the short-term models. Incorporating spatial and temporal autocorrelation is intended to develop more sophisticated validation approaches for identifying ML performance patterns—the goal of which is to predict concentration levels more accurately in comparison to using raw data without data reprocessing. The results show that the model developed using refined disaggregated data is able to detect the spatial distribution of the pollutant concentration at equivalent levels as the smoothed data models, although the latter display fewer errors. The performance of the short-term model including all variables is equivalent to the model omitting emissions. The ML results are compared to earlier stepwise regression model results, suggesting that ML has the ability to improve both long-term and short-term model accuracy. Our findings indicate that ML demonstrates higher predictive capacity in comparison to stepwise regression. The results from this study may be useful in enhancing the performance of ML through the incorporation of different data preprocessing tasks, as well as showing how different input variables contribute to the ML modeling process. The findings from this study could be used toward the development of environmental/eco-friendly routes that would decrease the risk for exposure to harmful vehicle-related emissions.

Description

Keywords

Machine learning, Land use regression, Emission factors, Black carbon, Particle Number, spatial and temporal variation, Air pollution

Citation