Lyme Disease in the United States
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The main goal of this project is to assist our client, Dr. Luis Escobar, in identifying a number of variables that are causing an increased number of Lyme disease cases in the United States. Our client gave us data regarding the amount of Lyme disease cases per county/per state over the course of a number of years, and we were tasked with finding variables that could contribute to the rise of Lyme disease cases.
We were tasked to scrape data from various online sources reporting what could potentially impact the rise of Lyme disease cases, clean and merge the data with the Lyme disease data, generate various regression plots to see if there was a correlation, find the best predictor variables out of the ones we have collected, and generate a choropleth graph using the best predictor variables. A final report of the process, the data, the plots, and an explanation of our findings was also requested.
Utilizing the programming language R and some web scraping aids that are related, our team was able to scrape data on human population density by county, human population count by county, per capita income per county, human development index (HDI) by county, research and development spending per state, and temperature and precipitation per county. This data was parsed to fit with our Lyme disease data, and multiple regression plots were created using the data we collected as the X-axis and the Lyme disease cases as the Y-axis. Once all of the plots were completed, the plots with the highest correlation were picked out to be generated into choropleth graphs.
There were some challenges with finding datasets that were able to fit with our Lyme disease cases, so some improvisations were made with the data we scraped to better fit the original dataset we were given. There was not much data pertaining to each county online, so for some cases, our Lyme disease data had to be merged to be per state rather than per county.
The work completed by our team can further the research on Lyme disease to decrease the rates in the number of cases for the foreseeable future. By finding predictor variables that show a high correlation to the increasing number of Lyme disease cases, researchers will be able to focus their attention on these predictor variables to find the most efficient methods of decreasing the number of Lyme disease cases. We expect future Virginia Tech students, Lyme disease researchers, and our client will use and improve upon our code to continue the war against Lyme disease.