The project is about analyzing and visualizing metadata of tourism websites of three states (Virginia, Colorado, and California) from 1998 to 2018.
Each state in the United States has its own state website that is used as a resource to attract new tourists to this location. Each of these sites usually includes great attractions in this state, travel tips and facts about this place, blog posts, and reviews from other people who have been there. Suggestions regarding what might attract potential customers could emerge from examining past tourism websites and looking for any patterns amongst them that would determine what worked and what didn’t. These patterns can then be used to determine what was successful and use that information to make better-informed decisions on the future of state tourism. We will use the historical analysis of past government tourism websites to further support research on content and traffic trends on these websites. The various iterations of each state's tourism website are saved as snapshots in the Internet Archive. Our team was given the Parquet files having the snapshots of data containing the information recording tourism for California, Colorado, and Virginia dating back to 1998. We used a combination of Python’s Pandas library and Beautiful Soup to examine and extract relevant pieces of data from the given Parquet files. This data was scraped to extract the meta tags used for the website as of that date. With this data, we plotted the presence of all the variations on a state's tourism website in chronological order. This made it possible for us to analyze the addition and removal of keywords and to see other changes that were made like using phrases, capitalizations, keywords in languages other than English, and updating of keywords based on internet trends. This led us to conclude that meta tags play a very important role in a website's search engine ranking and a lot of analysis needs to be done keeping in mind the primary user base of the website.