Prediction of Gravel Streambed Embeddedness Using Explainable AI and Machine Learning Techniques
dc.contributor.author | Oare, Emma Reilly | en |
dc.contributor.committeechair | Czuba, Jonathan A. | en |
dc.contributor.committeemember | Angermeier, Paul L. | en |
dc.contributor.committeemember | Batarseh, Feras A. | en |
dc.contributor.department | Biological Systems Engineering | en |
dc.date.accessioned | 2025-05-30T08:02:53Z | en |
dc.date.available | 2025-05-30T08:02:53Z | en |
dc.date.issued | 2025-05-29 | en |
dc.description.abstract | Excess fine sediments (<2 mm) in gravel streambeds can negatively impact river habitats by reducing porosity and interparticle flow. Human activities have increased fine sediment inputs into river systems, leading to higher embeddedness in gravel streambeds. Embeddedness, also called colmation or clogging, measures the fine sediment in interstitial pore spaces. However, measuring embeddedness in the field is time-consuming. A recent model predicted reach-scale embeddedness using bankfull shear velocity derived from remotely sensed data. This study extends that work by developing two boosting machine learning models using remotely sensed data and explainable artificial intelligence (XAI) to uncover important variables related to physical processes affecting embeddedness in Virginia (VA) and the United States (U.S.). The VA model used 1,125 embeddedness measurements from 906 sites provided by the VA Department of Environmental Quality, while the U.S. Geological Survey StreamStats tool provided 91 watershed features for each station, covering soil, land cover, and flow characteristics. The national model was built using 1,210 data points from the U.S. Environmental Protection Agency with corresponding watershed characteristics. Both models employed gradient boosting regression (GBR) and were optimized using Bayesian hyperparameter tuning via Optuna. For site-level embeddedness predictions, the VA GBR model achieved an R2 of 0.69 and a mean absolute error (MAE) of 9.8%, while the national model achieved an R2 of 0.49 and an MAE of 13.7%. A key element of this work was the use of XAI techniques, particularly SHapley Additive exPlanations, to interpret model outputs and uncover the physical processes driving embeddedness. Bankfull shear velocity consistently emerged as the most important predictor, with soil depth, basin relief, and land cover also contributing meaningfully as additional important variables. These findings demonstrate that scalable, interpretable models can predict streambed embeddedness with moderate accuracy. Overall, the ability to quickly predict embeddedness in VA and the U.S. is a critical step toward protecting aquatic habitats and improving future stream restoration efforts. | en |
dc.description.abstractgeneral | Healthy river habitats depend on clean streambeds that provide good places for aquatic life to live and grow. When too much fine sediment (like silt and clay) accumulates between gravel and cobbles, it can clog these spaces and harm aquatic life. This condition, known as embeddedness, reduces water flow and oxygen in the streambed, making it harder for fish and insects to survive and reproduce. Unfortunately, human activities like construction, farming, and deforestation have increased sediment runoff into rivers, worsening this problem. Traditionally, measuring embeddedness requires field crews to visit streams and make detailed observations—a process that is slow and labor-intensive. To address this challenge, this study developed two computer models that use satellite and landscape data to predict embeddedness across large areas. One model focused on Virginia, using 1,125 observations from local water monitoring stations. The second model covered the entire United States, using national data from the United States Environmental Protection Agency. Both models relied on machine learning to find patterns in the data, using a technique called "gradient boosting" to make accurate predictions. To better understand how the models made their predictions, explainability tools were used to highlight which factors mattered most. The most important predictor across both models was a variable called "bankfull shear velocity," which relates to the ability of a stream to lift particles up in the water during small floods. Other important factors included soil depth, land cover (like forests or farmland), and the steepness of the surrounding terrain. By accurately predicting where high embeddedness is a concern, these models can help scientists, policymakers, and environmental managers target restoration efforts more efficiently. This research represents a step forward in protecting river ecosystems using modern data tools and offers a scalable way to monitor stream health across wide regions. | en |
dc.description.degree | Master of Science | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:43424 | en |
dc.identifier.uri | https://hdl.handle.net/10919/134298 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Streambed embeddedness | en |
dc.subject | Bankfull shear velocity | en |
dc.subject | Machine learning | en |
dc.subject | Explainable artificial intelligence | en |
dc.subject | Gradient boosting regression | en |
dc.subject | Physics-informed variable | en |
dc.title | Prediction of Gravel Streambed Embeddedness Using Explainable AI and Machine Learning Techniques | en |
dc.type | Thesis | en |
thesis.degree.discipline | Biological Systems Engineering | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | masters | en |
thesis.degree.name | Master of Science | en |
Files
Original bundle
1 - 1 of 1