Leveraging Street View and Remote Sensing Imagery to Enhance Air Quality Modeling through Computer Vision and Machine Learning

TR Number




Journal Title

Journal ISSN

Volume Title


Virginia Tech


Air pollution is associated with various adverse health impacts and is identified as one of the leading risk factors for global disease burden. Further, air pollution is one of the pathways through which climate change could negatively impact health. Field studies have shown that air pollution has high spatiotemporal variability and pollutant concentrations vary substantially within neighborhoods. Characterizing air pollution at a fine-grained level is essential for accurately estimating human exposure, assessing its impact to human health, and further aiding localized air pollution policy. Air quality models are developed to estimate air pollution at locations and time periods without monitors, and these estimates are commonly used for exposure and health effects studies. Traditional land use regression [LUR] models are one of the cost-effective empirical air quality models. LUR typically relies on fixed-site measurements, GIS-derived variables with limited spatial resolution, and captures linear relationships. In recent years, innovative open-source imagery datasets and their associated features (e.g., street view imagery, remote sensing imagery) have emerged and show potential to augment or replace traditional LUR predictors. Such imagery data sources embody abundant information of natural and built environment features. Advanced computer vision techniques enable feature extraction and quantification through these extensive imagery datasets. The overarching objective of this dissertation is to investigate the feasibility of leveraging open-source imagery datasets (i.e., Google Street View [GSV] imagery, Landsat imagery, etc.) and advanced machine learning algorithms to develop image-based empirical air quality models at both local and national scale. The first study of this work established a pipeline of feature extraction through street view imagery sematic segmentation. The resulting street view features were used to predict street-level particulate air pollution for a single city. The results showed that solely using GSV-derived features can achieve comparable model fits as using traditional GIS-derived variables. Feature engineering improved model stability and interpretability through reducing spurious variables from potential misclassifications from computer vision algorithms. The second study further developed GSV-based models at national scale across multiple years. Random forest models were developed to capture the nonlinear relationship between air pollution and its impacting factors. The results showed that with sufficient street view images, GSV imagery alone may explain the variation of long-term national NO2 concentrations. Adding satellite-derived aerosol estimates (i.e., OMI column density) can significantly boost model performance when GSV images are insufficient, but the addition narrows when more GSV images are available. Our systematic assessment of the impact of image availability on model performance suggested that a parsimonious image sampling strategy (i.e., one GSV image per 100m grid) may be sufficient and most cost-effective for model development and application. Our third study explored the feasibility of combining street view and remote sensing derived features for national NO2 and PM2.5 modeling and projection at high spatial resolution. We found that GSV-based models captured both the highest and lowest pollutant concentrations while remote sensing features tended to smooth the air pollution variations. The results suggested that GSV features may have the capability to better capture fine-scale air pollution variability. The resulting air pollution prediction product may serve a variety of applications, including providing new insights into environmental justice and epidemiological studies due to its high spatial resolution (i.e., street level). Collectively, the result of this dissertation suggests that GSV imagery, processed with computer vision techniques, is a promising data source to develop empirical air quality models with high spatial resolution and consistent predictor variables processing protocol. Image-based features assisted with advanced ML approaches have the potential to greatly improve air quality modeling estimates, and successfully show comparable and even superior model performance than other modeling studies. Moreover, the ever-growing public imagery data sources are particularly promising for remote or less developed areas where traditional curated geodatabases are sparse or nonexistent.



air pollution, artificial intelligence, image sampling and processing, satellite, land use regression, exposure assessment, open-source, big geodata