Show simple item record

dc.contributor.authorKingery, Ryan
dc.contributor.authorYellapantula, Sudha Ravali
dc.contributor.authorXu, Chao
dc.contributor.authorHuang, Li Jun
dc.contributor.authorYe, Jiacheng
dc.description.abstractWe analyze various ways to perform abstractive text summarization on an entire collection of news articles. We specifically seek to summarize the collection of web-archived news articles relating to the 2018 shooting at Marjory Stoneman Douglas High School in Parkland, Florida. The original collection contains about 10,100 archived web pages that mostly relate to the shooting, which after pre-processing reduces to about 3,900 articles that directly relate to the shooting. We then explore several ways to generate abstractive summaries for the collection using deep learning methods. Since current deep learning methods for abstract summarization are only capable of summarizing text at the single-article level or below, to perform summarization on our collection, we identify a set of representative articles from the collection, summarize each of those articles using our deep learning models, and then concatenate those summaries together to produce a summary for the entire collection. To identify the representative articles to summarize we investigate various unsupervised methods to partition the space of articles into meaningful groups. We try choosing these articles by random sampling from the collection, by using topic modeling, and by sampling from clusters obtained from clustering on Doc2Vec embeddings. To summarize each individual article we explore various state of the art deep learning methods for abstractive summarization: a sequence-to-sequence model, a pointer generator network, and a reinforced extractor-abstractor network. To evaluate the quality of our summaries we employ two methods. The first is a subjective method, where each person subjectively ranked the quality of each summary. The second is an objective method which used various ROUGE metrics to compare each summary to an independently-generated gold standard summary. We found that most ROUGE scores were pretty low overall, with only the pointer-generator network on random articles picking up a ROUGE score above 0.15. This suggests that such deep learning techniques still have a lot of room for improvement if they are to be viable for collection summarization.en_US
dc.description.sponsorshipNSF: IIS-1619028en_US
dc.publisherVirginia Techen_US
dc.rightsAttribution-NonCommercial 3.0 United States*
dc.subjectmachine learningen_US
dc.subjectdeep learningen_US
dc.subjectnatural language processingen_US
dc.titleAbstractive Text Summarization of the Parkland Shooting Collectionen_US
dc.description.notesThese files below are all in reference to the abstractive summarization of the Parkland shooting collection: parkland-shooting-report.pdf: PDF version of the report. Files needed to compile the LaTeX report into the above PDF. parkland-shooting-presentation.pdf: PDF version of the project presentation. parkland-shooting-presentation.pptx: PowerPoint version of the project presentation. Code and data needed to reproduce results.en_US

Files in this item


This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial 3.0 United States
License: Attribution-NonCommercial 3.0 United States