Evaluating Sources of Arsenic in Groundwater in Virginia using a Logistic Regression Model
For this study, I have constructed a logistic regression model, using existing datasets of environmental parameters to predict the probability of As concentrations above 5 parts per billion (ppb) in Virginia groundwater and to evaluate if geologic or other characteristics are linked to elevated As concentrations. Measured As concentrations in groundwater from the Virginia Tech Biological Systems Engineering (BSE) Household Water Quality dataset were used as the dependent variable to train (calibrate) the model. Geologic units, lithology, soil series and texture, land use, and physiographic province were used as regressors in the model. Initial models included all regressors, but during model refinement, attention was focused solely on geologic units. Two geologic units, Triassic-aged sedimentary rocks and Devonian-aged shales/sandstones, were identified as significant in the model; the presence of these units at a spatial location results in a higher probability for As occurrences in groundwater. Measured As concentrations in groundwater from an independent dataset collected by the Virginia Department of Health were used to test (validate) the model. Due to the structure of the As datasets, which included As concentrations mostly (95-99%) = 5 ppb, and thus few (1-5%) data in the range > 5 ppb, the regression model cannot be used reliably to predict As concentrations in other parts of the state. However, our results are useful for identifying areas of Virginia, defined by underlying geology, that are more likely to have elevated As concentrations in groundwater. Results of this work suggest that homeowners with wells installed in these geologic units have their wells tested for As and regulators closely monitor public supply wells in these areas for As.