Computational Linguistic Analysis of Earthquake Collections

dc.contributor.authorBialousz, Kennethen
dc.contributor.authorKokal, Kevinen
dc.contributor.authorOrleans-Pobee, Kwaminaen
dc.contributor.authorWakeley, Christopheren
dc.date.accessioned2014-12-13T19:40:45Zen
dc.date.available2014-12-13T19:40:45Zen
dc.date.issued2014-12en
dc.descriptionBoth PDF and Word versions for the final report, a ZIP file of source code, and a PDF and PowerPoint of the final presentation.en
dc.description.abstractCS4984 is a newly-offered class at Virginia Tech with a unit based, project-problem based learning curriculum. This class style is based on NSF-funded work on curriculum for the field of digital libraries and related topics, and in this class, is used to guide a student based investigation of computational linguistics. The specific problem this report addresses is the creation of a means to automatically generate a short summary of a corpus of articles about earthquakes. Such a summary should be best representative of the texts and include all relevant information about earthquakes. For our analysis, we operated on two corpora--one about a 5.8 magnitude earthquake in Virginia in August 2011, and another about a 6.6 magnitude earthquake in April 2013 in Lushan, China. Techniques used to analyze the articles include clustering, lemmatization, frequency analysis of n-grams, and regular expression searches.en
dc.description.sponsorshipNSF DUE-1141209 and IIS-1319578en
dc.identifier.urihttp://hdl.handle.net/10919/51132en
dc.language.isoenen
dc.rightsCreative Commons CC0 1.0 Universal Public Domain Dedicationen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/en
dc.subjectnatural language processingen
dc.subjectHadoopen
dc.subjectMahouten
dc.subjectLDAen
dc.subjectK-means clusteringen
dc.subjectNLTKen
dc.subjectPythonen
dc.subjectnatural language generationen
dc.subjectSolren
dc.subjectStanford NERen
dc.subjectpart-of-speech taggingen
dc.titleComputational Linguistic Analysis of Earthquake Collectionsen
dc.typeDataseten
dc.typePresentationen
dc.typeSoftwareen
dc.typeTechnical reporten

Files

Original bundle
Now showing 1 - 5 of 5
Loading...
Thumbnail Image
Name:
Computation Linguistics Final Presentation.pdf
Size:
227.5 KB
Format:
Adobe Portable Document Format
Description:
Final Presentation (PDF)
Name:
Computation Linguistics Final Presentation.pptx
Size:
141.2 KB
Format:
Microsoft Powerpoint XML
Description:
Final Presentation (PowerPoint)
Name:
FinalReport.docx
Size:
44.14 KB
Format:
Microsoft Word XML
Description:
Final Report (Word)
Loading...
Thumbnail Image
Name:
FinalReport.pdf
Size:
382.15 KB
Format:
Adobe Portable Document Format
Description:
Final Report (PDF)
Name:
Source_Code.zip
Size:
76.03 MB
Format:
Unknown data format
Description:
Source Code
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: