Data Science for Multisectoral Water Use Management and STEM Education: A Synergistic Approach

TR Number

Date

2025-10-23

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

The unprecedented data deluge across disciplines offers transformative potential for addressing 21st century challenges, including sustainable water resources management amid climate change and population growth and transforming higher education. Data science provides a comprehensive framework for extracting actionable insights from this wealth of data, enabling evidence-based decision-making across complex systems at scale. However, the application of data science for sustainable water resources management across economic sectors in the United States (US) has been constrained by the lack of comprehensive data at granular spatiotemporal scales. Furthermore, data science applications for analyzing residential water use dynamics using large-scale, fine-resolution smart water meter data remain largely unexplored. Yet a more fundamental barrier to widespread data science adoption across disciplines is the insufficient integration of authentic data and data-driven modeling experiences for real-world problem-solving within undergraduate STEM curricula, limiting students' preparation for an increasingly data-driven workforce. This dissertation addresses these interconnected gaps by conducting synergistic data-driven and mixed-methods research at the intersection of civil engineering and engineering education. Using an inductive approach, the research first demonstrates specific cases of data science applications for water resources management. Then, it synthesizes lessons learned from these disciplinary cases along with instructor and student data from a multi-university and multi-disciplinary data science integration experiment to derive generalizable principles and assessment frameworks for integrating data science across undergraduate STEM curricula, advancing both water resources management and data science education in STEM disciplines. To address the fundamental data limitations constraining the application of data science methods to water resources management, this dissertation first developed the United States Water Withdrawals Database (USWWD), a comprehensive standardized compilation of user-level water withdrawal data across 42 states in the US. Through systematic collection and integration of heterogeneous state-level data sources, USWWD provides water withdrawal time series at unprecedented spatial and temporal resolutions, encompassing 188,597 unique water users, 353,082 points of diversion and use, and 57,559,412 withdrawal volumes across multiple economic sectors. The database standardizes diverse information on water users, withdrawal locations, volumes, source types, and primary water use categories, combining both direct measurements and various estimation techniques to reflect the diverse reporting methods utilized by different state agencies. By providing the most detailed national water use data to date at disaggregated spatiotemporal scales, USWWD enables comprehensive data science applications for understanding multisectoral water withdrawal patterns, trends, and drivers, directly supporting evidence-based water resource management, planning, and policy development across the US. The dissertation next focused on residential water use sector, applying data science methods to analyze residential water consumption patterns at both city and household scales using high-resolution smart water meter data. It used an unprecedented dataset of residential water use from 33,435 single-family households across 39 US cities over two winter months. At the city level, it used functional data analysis and mixed-effects random forest that revealed distinct consumption clusters, with 13 high and 6 low water-using cities (concentrated in coastal California) differing significantly from 20 medium water-using cities, where shower and toilet end uses emerged as primary drivers of water use. Extending this analysis to the household scale to assess the relative effects of behavioral versus fixture efficiency factors on total daily water use, the dissertation revealed that while behavioral factors explain most variation in total per capita indoor water use, fixture efficiency factors better differentiate between high and low water-using households, particularly around shower, toilet, and clothes washer end uses, with significant economies of scale observed as household size increases. These multi-scale findings provide critical insights for targeted urban water management strategies that combine fixture efficiency improvements with behavioral interventions, emphasizing the importance of scale-appropriate conservation approaches for different household categories and geographic contexts. Following the demonstrated applications in water resources management, this dissertation derives the principles for integrating data science into established undergraduate curricula through a multi-university research-practice partnership. Working with instructors across six courses at three universities and input from industry partners, the research documented how educators can effectively integrate discipline-specific data science modules into existing science and engineering courses, with instructors selecting discipline-agnostic topics such as data visualization and statistical analysis while adapting integration approaches to meet specific course needs, academic levels, and pedagogical requirements. Assessment of this integration approach through mixed-methods analysis of 877 student data across diverse demographics, academic levels, and disciplines revealed significant increases in students' self-reported motivation, skills, interest, and confidence in data science, with strong alignment between student self-assessments and instructor evaluations indicating effectiveness from both perspectives. The development of 12 publicly accessible data science modules across six disciplinary science and engineering fields, combined with empirical evidence of their educational impact, provides a transferable framework for preparing STEM graduates with essential data science competencies needed for an increasingly data-driven workforce, thereby addressing the fundamental educational barrier to widespread data science adoption across disciplines.

Description

Keywords

Data science, water withdrawal, fine spatial and temporal resolution, residential water use, smart water meters, end-use disaggregation, data science integration, STEM undergraduate education

Citation