Predicting lake surface water phosphorus dynamics using process-guided machine learning

Abstract

Phosphorus (P) loading to lakes is degrading the quality and usability of water globally. Accurate predictions of lake P dynamics are needed to understand whole-ecosystem P budgets, as well as the consequences of changing lake P concentrations for water quality. However, complex biophysical processes within lakes, along with limited observational data, challenge our capacity to reproduce short-term lake dynamics needed for water quality predictions, as well as long-term dynamics needed to understand broad scale controls over lake P. Here we use an emerging paradigm in modeling, process-guided machine learning (PGML), to produce a phosphorus budget for Lake Mendota (Wisconsin, USA) and to accurately predict epilimnetic phosphorus over a time range of days to decades. In our implementation of PGML, which we term a Process-Guided Recurrent Neural Network (PGRNN), we combine a process-based model for lake P with a recurrent neural network, and then constrain the predictions with ecological principles. We test independently the process-based model, the recurrent neural network, and the PGRNN to evaluate the overall approach. The process-based model accounted for most of the observed pattern in lake P; however it missed the long-term trend in lake P and had the worst performance in predicting winter and summer P in surface waters. The root mean square error (RMSE) for the process-based model, the recurrent neural network, and the PGRNN was 33.0 mu g P L-1, 22.7 mu g P L-1, and 20.7 mu g P L-1, respectively. All models performed better during summer, with RMSE values for the three models (same order) equal to 14.3 mu g P L-1, 10.9 mu g P L-1, and 10.7 mu g P L-1. Although the PGRNN had only marginally better RMSE during summer, it had lower bias and reproduced long-term decreases in lake P missed by the other two models. For all seasons and all years, the recurrent neural network had better predictions than process alone, with root mean square error (RMSE) of 23.8 mu g P L-1 and 28.0 mu g P L-1, respectively. The output of PGRNN indicated that new processes related to water temperature, thermal stratification, and long term changes in external loads are needed to improve the process model. By using ecological knowledge, as well as the information content of complex data, PGML shows promise as a technique for accurate prediction in messy, real-world ecological dynamics, while providing valuable information that can improve our understanding of process.

Description
Keywords
Phosphorus, Lake Mendota, Model, Machine learning, Lake, Long-term
Citation