VTechWorks staff will be away for the winter holidays starting Tuesday, December 24, 2024, through Wednesday, January 1, 2025, and will not be replying to requests during this time. Thank you for your patience, and happy holidays!
 

Big Data: New Zealand Earthquakes Summary

dc.contributor.authorBochel, Alexanderen
dc.contributor.authorEdmisten, Williamen
dc.contributor.authorLee, Junen
dc.contributor.authorChandalura, Rohiten
dc.date.accessioned2018-12-14T19:29:35Zen
dc.date.available2018-12-14T19:29:35Zen
dc.date.issued2018-12-14en
dc.description.abstractThe purpose of this Big Data project was to create a computer generated text summary of a major earthquake event in New Zealand. The summary was to be created from a large webpage dataset supplied for our team. This dataset contained 280MB of data. Our team used basic and advanced machine learning techniques in order to create the computer generated summary. The research behind finding an optimal way to create such summaries is important because it allows us to analyze large sets of textual information and to identify the most important parts. It takes a human a long time to write an accurate summary and may even be impossible with the number of documents in our dataset. The use of computers to do this automatically drastically increases the rate at which important information can be extracted from a set of data. The process our team followed to achieve our results is as follows. First, we extracted the most frequently appearing words in our dataset. Our second step was to examine these words and to tag them with their part of speech. The next step our team took was to find and examine the most frequent named entities. Our team then improved our set of important words through TF-IDF vectorization. The prior steps were then repeated with the improved set of words. Next our team focused on creating an extractive summary. Once we completed this step, we used templating to create our final summary. Our team had many interesting findings throughout this process. Our discoveries were as follows. We learned how to effectively use Zeppelin notebooks as a tool for prototyping code. We discovered an efficient way to run our large datasets using the Hadoop cluster along with PySpark. We discovered how to effectively clean our dataset prior to running our programs with it. We also discovered how to create the extractive summary using a template along with our important named entities. Our final result was achieved using the templating method together with abstractive summarization. Our final result included a successful generation of an extractive summary using the templating system. This result was readable and accurate according to the dataset that we were given. We also achieved decent results from the extractive summary technique. These techniques provided mostly readable summaries but still included some noise. Since our templated summary was very specific it is the most coherent and contains only relevant information.en
dc.description.notesBigDataTeam5FinalReport.pdf is the final deliverable for the report about Team 5's summarization of the New Zealand earthquakes. This report highlights every detail about the process of creating the summaries including manuals for how to use the code and how to pick up where our team left off. It is not editable. BigDataTeam5FinalReport.docx is the final deliverable for the report about Team 5's summarization of the New Zealand earthquakes. This version of the report is editable. BigDataTeam5FinalPresentation.pdf is the final presentation given in class about Team 5's summarization of the New Zealand Earthquakes. It highlights the deliverables for each task including the final summaries. This version of the presentation is not editable. BigDataTeam5FinalPresentation.pptx is the final presentation given in class about Team 5's summarization of the New Zealand Earthquakes. It highlights the deliverables for each task including the final summaries. This is the editable version of the presentation.en
dc.description.sponsorshipNSF: IIS-1619028en
dc.identifier.urihttp://hdl.handle.net/10919/86404en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectearthquakesen
dc.subjectmachine learningen
dc.subjectsummarizationen
dc.subjectbig dataen
dc.subjectNew Zealanden
dc.titleBig Data: New Zealand Earthquakes Summaryen
dc.typePresentationen
dc.typeReporten

Files

Original bundle
Now showing 1 - 4 of 4
Loading...
Thumbnail Image
Name:
BigDataTeam5FinalReport.pdf
Size:
716.68 KB
Format:
Adobe Portable Document Format
Name:
BigDataTeam5FinalReport.docx
Size:
781.07 KB
Format:
Microsoft Word XML
Name:
BigDataTeam5FinalPresentation.pptx
Size:
883.44 KB
Format:
Microsoft Powerpoint XML
Loading...
Thumbnail Image
Name:
BigDataTeam5FinalPresentation.pdf
Size:
156.32 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: