Browsing by Author "Rizzo, Robert"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Big Data Text Summarization - Hurricane HarveyGeissinger, Jack; Long, Theo; Jung, James; Parent, Jordan; Rizzo, Robert (Virginia Tech, 2018-12-12)Natural language processing (NLP) has advanced in recent years. Accordingly, we present progressively more complex generated text summaries on the topic Hurricane Harvey. We utilized TextRank, which is an unsupervised extractive summarization algorithm. TextRank is computationally expensive, and the sentences generated by the algorithm aren’t always directly related or essential to the topic at hand. When evaluating TextRank, we found that a single sentence interjected and ruined the flow of the summary. We also found that ROUGE evaluation for our TextRank summary was quite low compared to a golden standard that was prepared for us. However, the TextRank summary had high marks for ROUGE evaluation compared to the Wikipedia article lead for Hurricane Harvey. To improve upon the TextRank algorithm, we utilized template summarization with named entities. Template summarization takes less time to run than TextRank but is supervised by the author of the template and script to choose valuable named entities. Thus, it is highly dependent on human intervention to produce reasonable and readable summaries that aren’t error-prone. As expected, the template summary evaluated well compared to the Gold Standard and the Wikipedia article lead. This result is mainly due to our ability to include named entities we thought were pertinent to the summary. Beyond extractive summaries like TextRank and template summarization, we pursued abstractive summarization using pointer-generator networks and multi-document summarization with pointer-generator networks and maximal marginal relevance. The benefit of using abstractive summarization is that it is more in-line with how humans summarize documents. Pointer-generator networks, however, require GPUs to run properly and a large amount of training data. Luckily, we were able to use a pre-trained network to generate summaries. The pointer-generator network is the centerpiece of our abstractive methods and allowed us to create summaries in the first place. NLP is at an inflection point due to deep learning, and our generated summaries using a state-of-the-art pointer-generator neural network are filled with details about Hurricane Harvey, including damage incurred, the average amount of rainfall, and the locations it affected the most. The summary is also free of grammatical errors. We also use a novel Python library, written by Logan Lebanoff at the University of Central Florida, for multi-document summarization using deep learning to summarize our Hurricane Harvey dataset of 500 articles and the Wikipedia article for Hurricane Harvey. The summary of the Wikipedia article is our final summary and has the highest ROUGE scores that we could attain.
- Tourism Destination WebsitesDoan, Viet; Crawford, Matt; Nicholakos, Aki; Rizzo, Robert; Salopek, Jackson (Virginia Tech, 2019-05-08)This submission resulted from the semester-long team project focused on obtaining the data of 50 DMO websites, parsing the data, storing it in a database, and then visualizing it on a website. We have worked on this project for our client, Dr. Florian Zach, as a part of the Multimedia / Hypertext / Information Access course taught by Dr. Edward A. Fox. We have created a rudimentary website with much of the infrastructure necessary to visualize the data once we have entered it into the database. We have experimented extensively with web scraping technology like Heretrix3 and Scrapy, but then we learned that members of the Interrnet Archive could give us the data we want. We initially tabled our work on web scraping and instead focused on the website and visualizations. We constructed an API in GraphQL in order to query the database and relay the fetched data to the front end visualizations. The website with the visualizations was hosted on Microsoft Azure using a serverless model. On the website we have a homepage, page for visualizations, and a page for information about the project. The website contains a functional navigation bar to change between the three pages. Currently on the homepage, we have a basic USA country map visual with the ability to change a state’s color on a mouse hover. After complications with funding and learning that the Internet Archive would not be able to give us the data in time for us to complete the project, we pivoted away from the website and visualizations. We instead focused back on data collection and parsing. Using Scrapy we gathered the homepages of 98 tourism destination websites for each month they were available from April 2019 back to January 1996. We then used a series of Python scripts to parse this data into a dictionary of general information about the scraped sites as well as a set of CSV files recording the external links of the websites on the given months.