Application of training data affects success in broad-scale local climate zone mapping

Xu, Chunxue; Hystad, Perry; Chen, Rui; Van Den Hoek, Jamon; Hutchinson, Rebecca A.; Hankey, Steven C.; Kennedy, Robert

Application of training data affects success in broad-scale local climate zone mapping

dc.contributor.author	Xu, Chunxue	en
dc.contributor.author	Hystad, Perry	en
dc.contributor.author	Chen, Rui	en
dc.contributor.author	Van Den Hoek, Jamon	en
dc.contributor.author	Hutchinson, Rebecca A.	en
dc.contributor.author	Hankey, Steven C.	en
dc.contributor.author	Kennedy, Robert	en
dc.date.accessioned	2021-11-24T14:12:21Z	en
dc.date.available	2021-11-24T14:12:21Z	en
dc.date.issued	2021-12-01	en
dc.description.abstract	Satellite imagery has been widely used to map urbanization processes. To address the urgent need for urban landscape mapping that goes beyond urban footprint analysis, the local climate zone (LCZ) scheme has been increasingly used to reveal the urban forms and functions important to urban heat islands and micro-climates across the globe. As with most supervised classification strategies, proper application of training data is critical for the success of LCZ classification models. However, the collection and application of LCZ training areas brings with it two challenges that may affect mapping success. First, because digitizing training areas is a timeconsuming task, there is a broad effort in the LCZ mapping community to create a crowdsourced data collection among different experts. However, this strategy likely leads to inconsistencies in labels that could weaken models. Second, the LCZ labeling process typically involves the delineation of large zones from which multiple training samples are drawn, but those samples are likely spatially autocorrelated and lead to overly optimistic estimates of model accuracy. Although both effects - inconsistent labeling and spatial autocorrelation - are theoretically possible, it is unknown whether they substantially affect accuracy. We investigated both issues, specifically asking: (i) how do the discrepancies of LCZ labeling by different experts impact broad-scale LCZ mapping? (ii) to what extent does spatial correlation affect model prediction power? We used two classifiers (Random Forests and ResNets) to map eight metropolitan areas in the US into LCZs, comparing training areas drawn by different or consistent interpreters, and data splitting strategy using rules that allow or reduce spatial autocorrelation. We found large discrepancies among results built from crowdsourced training areas digitized by different experts; improving the consistency of labels can lead to substantial improvements in LCZ classification accuracy. Second, we found that spatial autocorrelation can boost the apparent accuracy of the classifier by 16% to 21%, leading to erroneous interpretation of mapping results. The two effects interplay as well: spatial auto correlation in the raw data can lead to an underestimation of the model's predictive error when modeling with crowdsourced training areas of high inconsistency. Due to the uncertainty in the labeling process and spatial autocorrelation in derived training data, broad-scale LCZ mapping results should be interpreted with caution.	en
dc.description.version	Published version	en
dc.format.mimetype	application/pdf	en
dc.identifier.doi	https://doi.org/10.1016/j.jag.2021.102482	en
dc.identifier.eissn	1872-826X	en
dc.identifier.issn	1569-8432	en
dc.identifier.other	102482	en
dc.identifier.uri	http://hdl.handle.net/10919/106732	en
dc.identifier.volume	103	en
dc.language.iso	en	en
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	en
dc.subject	Local climate zone	en
dc.subject	Machine learning	en
dc.subject	Training areas	en
dc.subject	Crowdsourced data	en
dc.subject	Spatial autocorrelation	en
dc.title	Application of training data affects success in broad-scale local climate zone mapping	en
dc.title.serial	International Journal of Applied Earth Observation and Geoinformation	en
dc.type	Article - Refereed	en
dc.type.dcmitype	text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1-s2.0-S0303243421001896-main.pdf
Size:: 25.83 MB
Format:: Adobe Portable Document Format
Description:: Published version

Download

Collections

Scholarly Works, School of Public and International Affairs