Abstractive Text Summarization of the Parkland Shooting Collection

dc.contributor.authorKingery, Ryanen
dc.contributor.authorYellapantula, Sudha Ravalien
dc.contributor.authorXu, Chaoen
dc.contributor.authorHuang, Li Junen
dc.contributor.authorYe, Jiachengen
dc.date.accessioned2018-12-13T15:17:08Zen
dc.date.available2018-12-13T15:17:08Zen
dc.date.issued2018-12-12en
dc.description.abstractWe analyze various ways to perform abstractive text summarization on an entire collection of news articles. We specifically seek to summarize the collection of web-archived news articles relating to the 2018 shooting at Marjory Stoneman Douglas High School in Parkland, Florida. The original collection contains about 10,100 archived web pages that mostly relate to the shooting, which after pre-processing reduces to about 3,900 articles that directly relate to the shooting. We then explore several ways to generate abstractive summaries for the collection using deep learning methods. Since current deep learning methods for abstract summarization are only capable of summarizing text at the single-article level or below, to perform summarization on our collection, we identify a set of representative articles from the collection, summarize each of those articles using our deep learning models, and then concatenate those summaries together to produce a summary for the entire collection. To identify the representative articles to summarize we investigate various unsupervised methods to partition the space of articles into meaningful groups. We try choosing these articles by random sampling from the collection, by using topic modeling, and by sampling from clusters obtained from clustering on Doc2Vec embeddings. To summarize each individual article we explore various state of the art deep learning methods for abstractive summarization: a sequence-to-sequence model, a pointer generator network, and a reinforced extractor-abstractor network. To evaluate the quality of our summaries we employ two methods. The first is a subjective method, where each person subjectively ranked the quality of each summary. The second is an objective method which used various ROUGE metrics to compare each summary to an independently-generated gold standard summary. We found that most ROUGE scores were pretty low overall, with only the pointer-generator network on random articles picking up a ROUGE score above 0.15. This suggests that such deep learning techniques still have a lot of room for improvement if they are to be viable for collection summarization.en
dc.description.notesThese files below are all in reference to the abstractive summarization of the Parkland shooting collection: parkland-shooting-report.pdf: PDF version of the report. parkland-shooting-report.zip: Files needed to compile the LaTeX report into the above PDF. parkland-shooting-presentation.pdf: PDF version of the project presentation. parkland-shooting-presentation.pptx: PowerPoint version of the project presentation. parkland-shooting-code.zip: Code and data needed to reproduce results.en
dc.description.sponsorshipNSF: IIS-1619028en
dc.identifier.urihttp://hdl.handle.net/10919/86370en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution-NonCommercial 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/3.0/us/en
dc.subjectnlpen
dc.subjectshootingen
dc.subjectsummarizationen
dc.subjectMachine learningen
dc.subjectdeep learningen
dc.subjectnatural language processingen
dc.titleAbstractive Text Summarization of the Parkland Shooting Collectionen
dc.typePresentationen
dc.typeReporten
dc.typeSoftwareen

Files

Original bundle
Now showing 1 - 5 of 5
Name:
parkland-shooting-code.zip
Size:
1.17 MB
Format:
Loading...
Thumbnail Image
Name:
parkland-shooting-report.pdf
Size:
780.66 KB
Format:
Adobe Portable Document Format
Name:
parkland-shooting-report.zip
Size:
643.73 KB
Format:
Loading...
Thumbnail Image
Name:
parkland-shooting-presentation.pdf
Size:
506.64 KB
Format:
Adobe Portable Document Format
Name:
parkland-shooting-presentation.pptx
Size:
870.06 KB
Format:
Microsoft Powerpoint XML
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: