CS4984: Special Topics
Permanent URI for this collection
The title of the CS4984 Special Topics class can change from year to year, for example, Computational Linguistics (2014) and Big Data Text Summarization (2018), and includes a graduate section, CS5984.
Browse
Browsing CS4984: Special Topics by Subject "Big Data"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Hurricane Matthew SummarizationGoldsworthy, Michael; Tran, Thoang; Asif, Areeb; Gregos, Brendan (Virginia Tech, 2018-12-14)The report, presentation, and code for our project for the course CS 4984/5984: Big Data Text Summarization are included in this submission. Our team had to explore methods of text summarization for two datasets, and report on our findings. The report covers our methods. The report starts with information on cleaning the data and filtering unnecessary documents. It then describes simple tasks such as counting the most common and important words and counting words by their part of speech. Following this, the report focuses on intermediate tasks such as clustering and finding LDA topics. Finally it presents our best methods for summarization, i.e., template and extractive summarization. We describe the algorithms, motivations, and conclusions we drew from each of our attempts. The report also contains a user and developer guide for using and maintaining our code, as well as a description of the tools and libraries we used. At the end there is also the Gold Standard Summary that we manually generated for another team in the course, to be used as a comparison for their automatically generated summary. We evaluated our automatically generated summary against a gold standard prepared by team 2, and found that our extractive summary performed the best based on its ROUGE scores. The source code zip file contains the code used for the tasks described in the report. The code was written in Python, and can be run only after installing the dependencies listed in the User Manual section of the report. The presentation file has the slides from the final presentation, containing much of the information in the report in a greatly simplified form. An editable version of the LaTeX document used to create our final report, and the editable PPTX file from our final presentation, are also included.
- Natural Language Processing: Generating a Summary of Flood DisastersAcanfora, Joseph; Evangelista, Marc; Keimig, David; Su, Myron (2014-12)In the event of a natural disaster like a flood, news outlets are in a rush to produce coverage for the general public. People may want a clear, concise summary of the event without having to read through hundreds of documents describing the event in different ways. The report of our work describes how to use computation techniques in Natural Language Processing (NLP) to automatically generate a summary on an instance of a flood event given a collection of diverse text documents. The body of this document covers NLP topics and techniques utilizing the NLTK Python library and Apache Hadoop to analyze and summarize a corpus. While this document describes the usage of such tools, it does not give an in-depth explanation of how these tools work, but rather focuses on their application to generating a summary of a flood event.