Show simple item record

dc.contributor.authorWanye, Frank
dc.contributor.authorGanguli, Samit
dc.contributor.authorTuckman, Matt
dc.contributor.authorZhang, Joy
dc.contributor.authorZhang, Fangzheng
dc.date.accessioned2018-12-14T15:51:04Z
dc.date.available2018-12-14T15:51:04Z
dc.date.issued2018-12-07
dc.identifier.urihttp://hdl.handle.net/10919/86399
dc.description.abstractWe present our approach for generating automatic summaries from a collection of news articles acquired from the World Wide Web relating to Hurricane Florence. Our approach consists of 10 distinct steps, at the end of which we produce three separate summaries using three distinct methods: 1. A template summary, in which we extract information from the web page collection to fill in blanks in a template. 2. An extractive summary, in which we extract the most important sentences from the web pages in the collection. 3. An abstractive summary, in which we use deep learning techniques to rephrase the contents of the web pages in the collection. The first six steps of our approach involve extracting important words, synsets, words constrained by part of speech, a set of discriminating features, important named entities, and important topics from the collection. This information is then used by the algorithms that generate the automatic summaries. To produce the template summary, we employed a modified version of the hurricane summary template provided to us by the instructor. For each blank space in the modified template, we used regular expression matching with selected keywords to filter out relevant sentences from the collection, and then a combination of regex matching and entity tagging to select the relevant information for filling in the blanks. Most values also required unit conversion to capture all values from the articles, not just values of a specific unit. Numerical analysis was then performed on these values to either get the mode or the mean from the set, and for some values such as rainfall the standard deviation was then used to estimate the maximum. To produce the extractive summary, we employed existing extractive summarization libraries. In order to synthesize information from multiple articles, we use an iterative approach, concatenating generated summaries, and summarizing the concatenated summaries. To produce the abstractive summary, we employed existing deep learning summarization techniques. In particular, we used a pre-trained Pointer-Generator neural network model. Similarly to the extractive summary, we cluster the web pages in the collection by topic, before running them through the neural network model, to reduce the amount of repeated information produced. Out of the three summaries that we generated, the template summary is the best overall due to its coherence. The abstractive and extractive summaries both provide a fair amount of information, but are severely lacking in organization and readability. Additionally, they provide specific details that are irrelevant to the hurricane. All three of the summaries could be improved with further data cleaning, and the template summary could be easily extended to include more information about the event so that it would be more complete.en_US
dc.description.sponsorshipNSF: IIS-1619028en_US
dc.language.isoen_USen_US
dc.publisherVirginia Techen_US
dc.rightsAttribution 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/*
dc.subjectnatural language processingen_US
dc.subjectbig dataen_US
dc.subjecttext summarizationen_US
dc.subjectautomatic summarizationen_US
dc.subjectHurricane Florenceen_US
dc.subjectmulti-document summarizationen_US
dc.subjectNLPen_US
dc.subjectabstractive summarizationen_US
dc.subjectextractive summarizationen_US
dc.subjecttemplate summarizationen_US
dc.subjectartificial intelligenceen_US
dc.titleAutomatic Summarization of News Articles about Hurricane Florenceen_US
dc.typePresentationen_US
dc.typeReporten_US
dc.typeSoftwareen_US
dc.description.notesHurricaneFlorenceCodebase.zip: the zipped code developed for the project, also available at https://github.com/ffrankies/BigDataTextSummarization HurricaneFlorenceOverleafFiles.zip: the zipped Overleaf project files used to generate HurricaneFlorenceProjectReport.pdf HurricaneFlorenceFinalPresentation.pdf: the final progress report presentation presented by our team. HurricaneFlorenceFinalPresentation.pptx: the final progress report presentation, in PowerPoint format. HurricaneFlorenceProjectReport.pdf: the project report giving details about the project. HurricaneFlorenceGoldenStandardSummary.txt: the Golden Standard Summary prepared by our team for Team 14, on the subject of the Facebook data breach.en_US


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution 3.0 United States
License: Attribution 3.0 United States