A Machine-Learning Based Approach to Predicting Waterborne Disease Outbreaks Caused by Hurricanes

dc.contributor.authorMansky, Christopher Immanuelen
dc.contributor.committeechairMunoz Pauta, David Fernandoen
dc.contributor.committeememberYoung, Kevin Daviden
dc.contributor.committeememberShealy, Earl Wadeen
dc.contributor.departmentCivil and Environmental Engineeringen
dc.date.accessioned2024-06-28T08:01:26Zen
dc.date.available2024-06-28T08:01:26Zen
dc.date.issued2024-06-27en
dc.description.abstractClimate change is increasing the frequency and intensity of (extra-) tropical cyclones including hurricanes and winter storms worldwide. Waterborne diseases, resulting from flood-related impacts, affect public health and are of major concern for society. Previous research studies have highlighted a statistically significant linear correlation between waterborne diseases and climate variables, especially precipitation and temperature. However, to the best of our knowledge, no studies have explored nonlinear models (e.g., machine learning) to predict waterborne disease outbreaks in the aftermath of hurricanes and winter storms. Here, we aim at predicting waterborne disease counts as well as disease outbreaks using historic climate demographic, and public health data of Florida, U.S. that date back to 1992. For this, we first predicted diseases in aggregated coastal counties using multiple linear (MLR) and random forest regression (RFR) models. Then, we developed a binary random forest classifier (RFC) model to predict waterborne disease outbreaks (e.g., 0: no outbreak and 1: outbreak). Results of this study showed that the highest coefficient of determination (R2) for the MLR model was 0.65 for two aggregated county groups, namely St. Johns-Duval-Nassau and Sarasota-Charlotte-Lee. The RFR model showed the highest R2 of 0.69 for the county group Sarasota-Charlotte-Lee. The highest Root Mean Square Error (RMSE) was found for the county group Miami Dade-Broward- Palm Beach with a value of 15 and 16 people for both the MLR and RFR models. St. Johns-Duval-Nassau and Sarasota-Charlotte-Lee groups achieved the highest Kling-Gupta Efficiency (KGE) of 0.76 for the MLR model. Sarasota-Charlotte-Lee also performed the best in terms of KGE for the RFR model with a score of 0.69. On the other hand, the binary RFC model for Pinellas-Hillsborough-Manatee achieved a model's accuracy of 0.93 and f1-score of 0.48. We anticipate that the models' performance can substantially be improved with access to higher spatial resolution climate data as well as longer demographic and public health records. Nevertheless, we here provide a solid methodology that can inform local authorities about imminent public health impacts and mitigate negative effects on society, economy, and environment.en
dc.description.abstractgeneralClimate change is increasing the frequency and intensity of tropical storms, which include hurricanes and winter storms worldwide. Extreme weather events have been shown to increase the risk of waterborne disease outbreaks (i.e. diseases that are transmitted by water), especially due to increased flooding. Previous studies showed a correlation between climate factors, such as precipitation and temperature, and waterborne diseases, but no concrete models have been developed to predict these outbreaks. Advanced prediction models can help predict where disease outbreaks are most likely to occur and can help in preparing for and mitigating the severity of these outbreaks to help save lives, protect the environment, and reduce the damage done to infrastructure. Our research focused on developing a model framework using climate and demographic data from coastal Florida counties dating back to 1992 to predict Salmonellosis, a common waterborne bacterial infection, after a hurricane event. We created two regression models, one a multiple linear regression (MLR) and the other a random forest regression (RFR) to predict the number of Salmonellosis cases. Additionally, we created a random forest classifier model (RFC) to predict whether an outbreak would occur. After running analyses for these three models on groups of three counties, we found that the MLR and RFR showed similar accuracies at predicting cases, with the MLR performing slightly better for most counties. For the Sarasota-Charlotte-Lee county group, the RFR performed the best. The RFC model performed the best with the highest accuracy of 93% for Pinellas-Hillsborough-Manatee. Future improvements can help make these models more reliable, such as using better and more data, along with adding more variables.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:40860en
dc.identifier.urihttps://hdl.handle.net/10919/119552en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectMachine Learningen
dc.subjectHurricanesen
dc.subjectPredictionen
dc.subjectWaterborne Diseasesen
dc.subjectCivil Engineeringen
dc.titleA Machine-Learning Based Approach to Predicting Waterborne Disease Outbreaks Caused by Hurricanesen
dc.typeThesisen
thesis.degree.disciplineCivil Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mansky_CI_T_2024.pdf
Size:
4.49 MB
Format:
Adobe Portable Document Format

Collections