Data Science for Multisectoral Water Use Management and STEM Education: A Synergistic Approach
| dc.contributor.author | Naseri, Mohammad Yunus | en |
| dc.contributor.committeechair | Marston, Landon Todd | en |
| dc.contributor.committeechair | Lohani, Vinod K. | en |
| dc.contributor.committeemember | Karpatne, Anuj | en |
| dc.contributor.committeemember | Sinha, Sunil Kumar | en |
| dc.contributor.department | Civil and Environmental Engineering | en |
| dc.date.accessioned | 2025-10-24T08:00:29Z | en |
| dc.date.available | 2025-10-24T08:00:29Z | en |
| dc.date.issued | 2025-10-23 | en |
| dc.description.abstract | The unprecedented data deluge across disciplines offers transformative potential for addressing 21st century challenges, including sustainable water resources management amid climate change and population growth and transforming higher education. Data science provides a comprehensive framework for extracting actionable insights from this wealth of data, enabling evidence-based decision-making across complex systems at scale. However, the application of data science for sustainable water resources management across economic sectors in the United States (US) has been constrained by the lack of comprehensive data at granular spatiotemporal scales. Furthermore, data science applications for analyzing residential water use dynamics using large-scale, fine-resolution smart water meter data remain largely unexplored. Yet a more fundamental barrier to widespread data science adoption across disciplines is the insufficient integration of authentic data and data-driven modeling experiences for real-world problem-solving within undergraduate STEM curricula, limiting students' preparation for an increasingly data-driven workforce. This dissertation addresses these interconnected gaps by conducting synergistic data-driven and mixed-methods research at the intersection of civil engineering and engineering education. Using an inductive approach, the research first demonstrates specific cases of data science applications for water resources management. Then, it synthesizes lessons learned from these disciplinary cases along with instructor and student data from a multi-university and multi-disciplinary data science integration experiment to derive generalizable principles and assessment frameworks for integrating data science across undergraduate STEM curricula, advancing both water resources management and data science education in STEM disciplines. To address the fundamental data limitations constraining the application of data science methods to water resources management, this dissertation first developed the United States Water Withdrawals Database (USWWD), a comprehensive standardized compilation of user-level water withdrawal data across 42 states in the US. Through systematic collection and integration of heterogeneous state-level data sources, USWWD provides water withdrawal time series at unprecedented spatial and temporal resolutions, encompassing 188,597 unique water users, 353,082 points of diversion and use, and 57,559,412 withdrawal volumes across multiple economic sectors. The database standardizes diverse information on water users, withdrawal locations, volumes, source types, and primary water use categories, combining both direct measurements and various estimation techniques to reflect the diverse reporting methods utilized by different state agencies. By providing the most detailed national water use data to date at disaggregated spatiotemporal scales, USWWD enables comprehensive data science applications for understanding multisectoral water withdrawal patterns, trends, and drivers, directly supporting evidence-based water resource management, planning, and policy development across the US. The dissertation next focused on residential water use sector, applying data science methods to analyze residential water consumption patterns at both city and household scales using high-resolution smart water meter data. It used an unprecedented dataset of residential water use from 33,435 single-family households across 39 US cities over two winter months. At the city level, it used functional data analysis and mixed-effects random forest that revealed distinct consumption clusters, with 13 high and 6 low water-using cities (concentrated in coastal California) differing significantly from 20 medium water-using cities, where shower and toilet end uses emerged as primary drivers of water use. Extending this analysis to the household scale to assess the relative effects of behavioral versus fixture efficiency factors on total daily water use, the dissertation revealed that while behavioral factors explain most variation in total per capita indoor water use, fixture efficiency factors better differentiate between high and low water-using households, particularly around shower, toilet, and clothes washer end uses, with significant economies of scale observed as household size increases. These multi-scale findings provide critical insights for targeted urban water management strategies that combine fixture efficiency improvements with behavioral interventions, emphasizing the importance of scale-appropriate conservation approaches for different household categories and geographic contexts. Following the demonstrated applications in water resources management, this dissertation derives the principles for integrating data science into established undergraduate curricula through a multi-university research-practice partnership. Working with instructors across six courses at three universities and input from industry partners, the research documented how educators can effectively integrate discipline-specific data science modules into existing science and engineering courses, with instructors selecting discipline-agnostic topics such as data visualization and statistical analysis while adapting integration approaches to meet specific course needs, academic levels, and pedagogical requirements. Assessment of this integration approach through mixed-methods analysis of 877 student data across diverse demographics, academic levels, and disciplines revealed significant increases in students' self-reported motivation, skills, interest, and confidence in data science, with strong alignment between student self-assessments and instructor evaluations indicating effectiveness from both perspectives. The development of 12 publicly accessible data science modules across six disciplinary science and engineering fields, combined with empirical evidence of their educational impact, provides a transferable framework for preparing STEM graduates with essential data science competencies needed for an increasingly data-driven workforce, thereby addressing the fundamental educational barrier to widespread data science adoption across disciplines. | en |
| dc.description.abstractgeneral | We live in an age of unprecedented information generation that offers transformative potential for solving major 21st century challenges like sustainable water management amid climate change and population growth and transforming undergraduate education. Data science provides powerful tools for turning this wealth of information into actionable insights that can guide smart decision-making. However, applying these tools to water management has been severely limited by the lack of comprehensive, detailed data across different economic sectors in the United States (US). Additionally, while smart water meters now collect detailed information about household water use across the country, this data remains largely untapped for understanding how and why families use water differently. Perhaps most fundamentally, college students in science and engineering are not learning to work with real-world data and analytical tools during their education, leaving them unprepared for an increasingly data-driven workforce. This research addressed these interconnected challenges using an inductive approach that first demonstrates specific applications of data science in water management, then uses these examples to develop broader principles for teaching data science across science and engineering disciplines. To address the critical data gaps limiting application of data science for water management, this dissertation created the United States Water Withdrawals Database, which compiled water use records from 42 states into one comprehensive database. This database contains information from nearly 189,000 water users – from large power plants to individual farms – representing over 57 million water withdrawal values, creating the first detailed, publicly-available picture of where water goes after it is taken from rivers, lakes, and underground sources by the water user. The dissertation then applied data science methods to analyze household water consumption using smart water meters in over 33,000 homes across 39 major cities. The findings revealed distinct patterns: cities in coastal California consistently used the least water indoors, while cities in other regions varied dramatically. Crucially, the research discovered that while people's daily water-using habits explain most differences in water use, the efficiency of household fixtures is what really separates high water-using homes from low water-using ones, providing critical insights for targeted conservation strategies in cities. Building on these demonstrated applications of data science in water management, this dissertation then derived principles for integrating data science into undergraduate science and engineering curricula. Working with professors at three universities across six different courses and input from industry partners, the study created 12 learning modules that allow instructors to teach their students to analyze real data within the context of their disciplines – such as using actual flood data in engineering classes or real pollution measurements in environmental science courses. Assessment with data from 877 students from diverse backgrounds revealed significant increases in students' confidence and skills in working with data, with strong alignment between student self-assessments and instructor evaluations. This approach provides a transferable framework for educators nationwide to prepare graduates with essential data science competencies, ultimately addressing the fundamental educational barrier to widespread data science adoption and enabling more effective solutions to complex challenges like sustainable water management. | en |
| dc.description.degree | Doctor of Philosophy | en |
| dc.format.medium | ETD | en |
| dc.identifier.other | vt_gsexam:44676 | en |
| dc.identifier.uri | https://hdl.handle.net/10919/138654 | en |
| dc.language.iso | en | en |
| dc.publisher | Virginia Tech | en |
| dc.rights | Creative Commons Attribution 4.0 International | en |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | en |
| dc.subject | Data science | en |
| dc.subject | water withdrawal | en |
| dc.subject | fine spatial and temporal resolution | en |
| dc.subject | residential water use | en |
| dc.subject | smart water meters | en |
| dc.subject | end-use disaggregation | en |
| dc.subject | data science integration | en |
| dc.subject | STEM undergraduate education | en |
| dc.title | Data Science for Multisectoral Water Use Management and STEM Education: A Synergistic Approach | en |
| dc.type | Dissertation | en |
| thesis.degree.discipline | Civil Engineering | en |
| thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
| thesis.degree.level | doctoral | en |
| thesis.degree.name | Doctor of Philosophy | en |