Browsing by Author "Jia, Xiaowei"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Predicting lake surface water phosphorus dynamics using process-guided machine learningHanson, Paul C.; Stillman, Aviah B.; Jia, Xiaowei; Karpatne, Anuj; Dugan, Hilary A.; Carey, Cayelan C.; Stachelek, Joseph; Ward, Nicole K.; Zhang, Yu; Read, Jordan S.; Kumar, Vipin (2020-08-15)Phosphorus (P) loading to lakes is degrading the quality and usability of water globally. Accurate predictions of lake P dynamics are needed to understand whole-ecosystem P budgets, as well as the consequences of changing lake P concentrations for water quality. However, complex biophysical processes within lakes, along with limited observational data, challenge our capacity to reproduce short-term lake dynamics needed for water quality predictions, as well as long-term dynamics needed to understand broad scale controls over lake P. Here we use an emerging paradigm in modeling, process-guided machine learning (PGML), to produce a phosphorus budget for Lake Mendota (Wisconsin, USA) and to accurately predict epilimnetic phosphorus over a time range of days to decades. In our implementation of PGML, which we term a Process-Guided Recurrent Neural Network (PGRNN), we combine a process-based model for lake P with a recurrent neural network, and then constrain the predictions with ecological principles. We test independently the process-based model, the recurrent neural network, and the PGRNN to evaluate the overall approach. The process-based model accounted for most of the observed pattern in lake P; however it missed the long-term trend in lake P and had the worst performance in predicting winter and summer P in surface waters. The root mean square error (RMSE) for the process-based model, the recurrent neural network, and the PGRNN was 33.0 mu g P L-1, 22.7 mu g P L-1, and 20.7 mu g P L-1, respectively. All models performed better during summer, with RMSE values for the three models (same order) equal to 14.3 mu g P L-1, 10.9 mu g P L-1, and 10.7 mu g P L-1. Although the PGRNN had only marginally better RMSE during summer, it had lower bias and reproduced long-term decreases in lake P missed by the other two models. For all seasons and all years, the recurrent neural network had better predictions than process alone, with root mean square error (RMSE) of 23.8 mu g P L-1 and 28.0 mu g P L-1, respectively. The output of PGRNN indicated that new processes related to water temperature, thermal stratification, and long term changes in external loads are needed to improve the process model. By using ecological knowledge, as well as the information content of complex data, PGML shows promise as a technique for accurate prediction in messy, real-world ecological dynamics, while providing valuable information that can improve our understanding of process.
- Process-Guided Deep Learning Predictions of Lake Water TemperatureRead, Jordan S.; Jia, Xiaowei; Willard, Jared; Appling, Alison P.; Zwart, Jacob A.; Oliver, Samantha K.; Karpatne, Anuj; Hansen, Gretchen J. A.; Hanson, Paul C.; Watkins, William; Steinbach, Michael; Kumar, Vipin (2019-11-08)The rapid growth of data in water resources has created new opportunities to accelerate knowledge discovery with the use of advanced deep learning tools. Hybrid models that integrate theory with state-of-the art empirical techniques have the potential to improve predictions while remaining true to physical laws. This paper evaluates the Process-Guided Deep Learning (PGDL) hybrid modeling framework with a use-case of predicting depth-specific lake water temperatures. The PGDL model has three primary components: a deep learning model with temporal awareness (long short-term memory recurrence), theory-based feedback (model penalties for violating conversation of energy), and model pretraining to initialize the network with synthetic data (water temperature predictions from a process-based model). In situ water temperatures were used to train the PGDL model, a deep learning (DL) model, and a process-based (PB) model. Model performance was evaluated in various conditions, including when training data were sparse and when predictions were made outside of the range in the training data set. The PGDL model performance (as measured by root-mean-square error (RMSE)) was superior to DL and PB for two detailed study lakes, but only when pretraining data included greater variability than the training period. The PGDL model also performed well when extended to 68 lakes, with a median RMSE of 1.65 degrees C during the test period (DL: 1.78 degrees C, PB: 2.03 degrees C; in a small number of lakes PB or DL models were more accurate). This case-study demonstrates that integrating scientific knowledge into deep learning tools shows promise for improving predictions of many important environmental variables.