An evaluation of a data-driven approach to regional scale surface runoff modelling


TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


Modelling surface runoff can be beneficial to operations within many fields, such as agriculture planning, flood and drought risk assessment, and water resource management. In this study, we built a data-driven model that can reproduce monthly surface runoff at a 4-km grid network covering 13 watersheds in the Chesapeake Bay area. We used a random forest algorithm to build the model, where monthly precipitation, temperature, land cover, and topographic data were used as predictors, and monthly surface runoff generated by the SWAT hydrological model was used as the response. A sub-model was developed for each of 12 monthly surface runoff estimates, independent of one another. Accuracy statistics and variable importance measures from the random forest algorithm reveal that precipitation was the most important variable to the model, but including climatological data from multiple months as predictors significantly improves the model performance. Using 3-month climatological, land cover, and DEM derivatives from 40% of the 4-km grids as the training dataset, our model successfully predicted surface runoff for the remaining 60% of the grids (mean R2 (RMSE) for the 12 monthly models is 0.83 (6.60 mm)). The lowest R2 was associated with the model for August, when the surface runoff values are least in a year. In all studied watersheds, the highest predictive errors were found within the watershed with greatest topographic complexity, for which the model tended to underestimate surface runoff. For the other 12 watersheds studied, the data-driven model produced smaller and more spatially consistent predictive errors.



data-driven modelling, surface runoff simulation, random forest, Machine learning, Chesapeake Bay