Comparison of machine learning algorithms for emulation of a gridded hydrological model given spatially explicit inputs

TR Number
Journal Title
Journal ISSN
Volume Title

This study compares the performance of several machine learning algorithms in reproducing the spatial and temporal outputs of the process-based, hydrological model, ParFlow.CLM. Emulators or surrogate models are often used to reduce complexity and simulation times of complex models, and have typically been applied to evaluate parameter sensitivity or for model parameter tuning, without explicit treatment of variation resulting from spatially explicit inputs to the model. Here we present a case study in which we evaluate candidate machine learning algorithms for suitability emulating model outputs given spatially explicit inputs. We find that among random forest, gaussian process, k-nearest neighbors, and deep neural networks, the random forest algorithm performs the best on small training sets, is not as sensitive to hyperparameters chosen for the machine learning model, and can be trained quickly. Although deep neural networks were hypothesized to be able to better capture the potential nonlinear interactions in ParFlow.CLM, they also required more training data and much more refined tuning of hyperparameters to achieve the potential benefits of the algorithm.

04 Earth Sciences, 08 Information and Computing Sciences, 09 Engineering, Geochemistry & Geophysics