Prediction of Gravel Streambed Embeddedness Using Explainable AI and Machine Learning Techniques

dc.contributor.authorOare, Emma Reillyen
dc.contributor.committeechairCzuba, Jonathan A.en
dc.contributor.committeememberAngermeier, Paul L.en
dc.contributor.committeememberBatarseh, Feras A.en
dc.contributor.departmentBiological Systems Engineeringen
dc.date.accessioned2025-05-30T08:02:53Zen
dc.date.available2025-05-30T08:02:53Zen
dc.date.issued2025-05-29en
dc.description.abstractExcess fine sediments (<2 mm) in gravel streambeds can negatively impact river habitats by reducing porosity and interparticle flow. Human activities have increased fine sediment inputs into river systems, leading to higher embeddedness in gravel streambeds. Embeddedness, also called colmation or clogging, measures the fine sediment in interstitial pore spaces. However, measuring embeddedness in the field is time-consuming. A recent model predicted reach-scale embeddedness using bankfull shear velocity derived from remotely sensed data. This study extends that work by developing two boosting machine learning models using remotely sensed data and explainable artificial intelligence (XAI) to uncover important variables related to physical processes affecting embeddedness in Virginia (VA) and the United States (U.S.). The VA model used 1,125 embeddedness measurements from 906 sites provided by the VA Department of Environmental Quality, while the U.S. Geological Survey StreamStats tool provided 91 watershed features for each station, covering soil, land cover, and flow characteristics. The national model was built using 1,210 data points from the U.S. Environmental Protection Agency with corresponding watershed characteristics. Both models employed gradient boosting regression (GBR) and were optimized using Bayesian hyperparameter tuning via Optuna. For site-level embeddedness predictions, the VA GBR model achieved an R2 of 0.69 and a mean absolute error (MAE) of 9.8%, while the national model achieved an R2 of 0.49 and an MAE of 13.7%. A key element of this work was the use of XAI techniques, particularly SHapley Additive exPlanations, to interpret model outputs and uncover the physical processes driving embeddedness. Bankfull shear velocity consistently emerged as the most important predictor, with soil depth, basin relief, and land cover also contributing meaningfully as additional important variables. These findings demonstrate that scalable, interpretable models can predict streambed embeddedness with moderate accuracy. Overall, the ability to quickly predict embeddedness in VA and the U.S. is a critical step toward protecting aquatic habitats and improving future stream restoration efforts.en
dc.description.abstractgeneralHealthy river habitats depend on clean streambeds that provide good places for aquatic life to live and grow. When too much fine sediment (like silt and clay) accumulates between gravel and cobbles, it can clog these spaces and harm aquatic life. This condition, known as embeddedness, reduces water flow and oxygen in the streambed, making it harder for fish and insects to survive and reproduce. Unfortunately, human activities like construction, farming, and deforestation have increased sediment runoff into rivers, worsening this problem. Traditionally, measuring embeddedness requires field crews to visit streams and make detailed observations—a process that is slow and labor-intensive. To address this challenge, this study developed two computer models that use satellite and landscape data to predict embeddedness across large areas. One model focused on Virginia, using 1,125 observations from local water monitoring stations. The second model covered the entire United States, using national data from the United States Environmental Protection Agency. Both models relied on machine learning to find patterns in the data, using a technique called "gradient boosting" to make accurate predictions. To better understand how the models made their predictions, explainability tools were used to highlight which factors mattered most. The most important predictor across both models was a variable called "bankfull shear velocity," which relates to the ability of a stream to lift particles up in the water during small floods. Other important factors included soil depth, land cover (like forests or farmland), and the steepness of the surrounding terrain. By accurately predicting where high embeddedness is a concern, these models can help scientists, policymakers, and environmental managers target restoration efforts more efficiently. This research represents a step forward in protecting river ecosystems using modern data tools and offers a scalable way to monitor stream health across wide regions.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:43424en
dc.identifier.urihttps://hdl.handle.net/10919/134298en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectStreambed embeddednessen
dc.subjectBankfull shear velocityen
dc.subjectMachine learningen
dc.subjectExplainable artificial intelligenceen
dc.subjectGradient boosting regressionen
dc.subjectPhysics-informed variableen
dc.titlePrediction of Gravel Streambed Embeddedness Using Explainable AI and Machine Learning Techniquesen
dc.typeThesisen
thesis.degree.disciplineBiological Systems Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Oare_ER_T_2025.pdf
Size:
42.25 MB
Format:
Adobe Portable Document Format

Collections