Prediction of Gravel Streambed Embeddedness Using Explainable AI and Machine Learning Techniques
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Excess fine sediments (<2 mm) in gravel streambeds can negatively impact river habitats by reducing porosity and interparticle flow. Human activities have increased fine sediment inputs into river systems, leading to higher embeddedness in gravel streambeds. Embeddedness, also called colmation or clogging, measures the fine sediment in interstitial pore spaces. However, measuring embeddedness in the field is time-consuming. A recent model predicted reach-scale embeddedness using bankfull shear velocity derived from remotely sensed data. This study extends that work by developing two boosting machine learning models using remotely sensed data and explainable artificial intelligence (XAI) to uncover important variables related to physical processes affecting embeddedness in Virginia (VA) and the United States (U.S.). The VA model used 1,125 embeddedness measurements from 906 sites provided by the VA Department of Environmental Quality, while the U.S. Geological Survey StreamStats tool provided 91 watershed features for each station, covering soil, land cover, and flow characteristics. The national model was built using 1,210 data points from the U.S. Environmental Protection Agency with corresponding watershed characteristics. Both models employed gradient boosting regression (GBR) and were optimized using Bayesian hyperparameter tuning via Optuna. For site-level embeddedness predictions, the VA GBR model achieved an R2 of 0.69 and a mean absolute error (MAE) of 9.8%, while the national model achieved an R2 of 0.49 and an MAE of 13.7%. A key element of this work was the use of XAI techniques, particularly SHapley Additive exPlanations, to interpret model outputs and uncover the physical processes driving embeddedness. Bankfull shear velocity consistently emerged as the most important predictor, with soil depth, basin relief, and land cover also contributing meaningfully as additional important variables. These findings demonstrate that scalable, interpretable models can predict streambed embeddedness with moderate accuracy. Overall, the ability to quickly predict embeddedness in VA and the U.S. is a critical step toward protecting aquatic habitats and improving future stream restoration efforts.