Hybrid Summarization of Dakota Access Pipeline Protests (NoDAPL)

dc.contributor.authorChen, Xiaoyuen
dc.contributor.authorWang, Haitaoen
dc.contributor.authorMehrotra, Maanaven
dc.contributor.authorChhikara, Namanen
dc.contributor.authorSun, Dien
dc.date.accessioned2018-12-14T16:28:48Zen
dc.date.available2018-12-14T16:28:48Zen
dc.date.issued2018-12-14en
dc.description.abstractDakota Access Pipeline Protests (known with the hashtag #NoDAPL) are grassroots movements that began in April 2016 in reaction to the approved construction of Energy Transfer Partners’ Dakota Access Pipeline in the northern United States. The NoDAPL movements produce many FaceBook messages, tweets, blogs, and news, which reflect different aspects of the NoDAPL events. The related information keeps increasing rapidly, which makes it difficult to understand the events in an efficient manner. Therefore, it is invaluable to automatically or at least semi-automatically generate short summaries based on the online available big data. Motivated by this automatic summarization need, the objective of this project is to propose a novel automatic summarization approach to efficiently and effectively summarize the topics hidden in the online big text data. Although automatic summarization has been investigated for more than 60 years since the publication of Luhn’s 1958 seminal paper, several challenges exist in summarizing online big text sets, such as large proportion of noise texts, highly redundant information, multiple latent topics, etc. Therefore, we propose an automatic framework with minimal human efforts to summarize big online text sets (~11,000 documents on NoDAPL) according to latent topics with nonrelevant information removed. This framework provides a hybrid model to combine the advantages of latent Dirichlet allocation (LDA) based extractive and deep-learning based abstractive methods. Different from semi-automatic summarization approaches such as template-based summarization, the proposed method does not require a deep understanding of the events from the practitioners to create the template nor to fill in the template by using regular expressions. During the procedure, the only human effort needed is to manually label a few (say, 100) documents as relevant and irrelevant. We evaluate the quality of the generated automatic summary with both extrinsic and intrinsic measurement. In the extrinsic subjective evaluation, we design a set of guideline questions and conduct a task-based measurement. Results show that 91.3% of sentences are within the scope of the guideline, and 69.6% of the outlined questions can be answered by reading the generated summary. The intrinsic ROUGE measurements show our entity coverage is a total of 2.6% and ROUGE L and ROUGE SU4 scores are 0.148 and 0.065. Overall, the proposed hybrid model achieves decent performance on summarizing NoDAPL events. Future work includes testing of the approach with more textual datasets for interesting topics, and investigation of topic modeling-supervised classification approach to minimize human efforts in automatic summarization. Besides, we also would like to investigate a deep learning-based recommender system for better sentence re-ranking.en
dc.description.notesThis submission includes: 1) Final_Report_NoDAPL_Team8_Submitted.docx: final report in .docx format; 2) Final_Report_NoDAPL_Team8_Submitted.pdf: final report in .pdf format; 3) Final_Presentation_NoDAPL_Team8_Submitted.pptx: final presentation in .pptx format; 4) Final_Presentation_NoDAPL_Team8_Submitted.pdf: final presentation in .pdf format; 5) Source_Code_Hybrid_Summarization_of_NoDAPL_CS4984CS5984_2018.zip: final collection of source code in .zip format with READEME.mden
dc.description.sponsorshipNational Science Foundationen
dc.description.sponsorshipNSF: IIS-1619028en
dc.identifier.urihttp://hdl.handle.net/10919/86401en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/en
dc.subjectText Summarizationen
dc.subjectHybrid Modelen
dc.subjectNoDAPLen
dc.subjectDeep learning (Machine learning)en
dc.subjectNatural Language Processingen
dc.titleHybrid Summarization of Dakota Access Pipeline Protests (NoDAPL)en
dc.typePresentationen
dc.typeReporten
dc.typeSoftwareen

Files

Original bundle
Now showing 1 - 5 of 5
Loading...
Thumbnail Image
Name:
Final_Report_NoDAPL_Team8_Submitted.pdf
Size:
2.89 MB
Format:
Adobe Portable Document Format
Name:
Final_Report_NoDAPL_Team8_Submitted.docx
Size:
3.34 MB
Format:
Microsoft Word XML
Loading...
Thumbnail Image
Name:
Final_Presentation_NoDAPL_Team8_Submitted.pdf
Size:
2.95 MB
Format:
Adobe Portable Document Format
Name:
Final_Presentation_NoDAPL_Team8_Submitted.pptx
Size:
5.88 MB
Format:
Microsoft Powerpoint XML
Name:
Source_Code_Hybrid_Summarization_of_NoDAPL_CS4984CS5984_2018.zip
Size:
88.29 MB
Format:
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: