Natural Language Processing: Generating a Summary of Flood Disasters


In the event of a natural disaster like a flood, news outlets are in a rush to produce coverage for the general public. People may want a clear, concise summary of the event without having to read through hundreds of documents describing the event in different ways. The report of our work describes how to use computation techniques in Natural Language Processing (NLP) to automatically generate a summary on an instance of a flood event given a collection of diverse text documents. The body of this document covers NLP topics and techniques utilizing the NLTK Python library and Apache Hadoop to analyze and summarize a corpus. While this document describes the usage of such tools, it does not give an in-depth explanation of how these tools work, but rather focuses on their application to generating a summary of a flood event.


"Flood Presentation" in both PowerPoint and PDF formats is from the final in-class presentation. Floods_Group_H.pdf is the PDF version of the final report document. has the original version of that document.


Natural Language Processing, Flooding, Machine learning, named entity recognition, NER, Hadoop, Mahout, Big Data, NLTK


NSF DUE-1141209 and IIS-1319578