VTechWorks staff will be away for the winter holidays starting Tuesday, December 24, 2024, through Wednesday, January 1, 2025, and will not be replying to requests during this time. Thank you for your patience, and happy holidays!
 

Computational Linguistic Analysis of Earthquake Collections

dc.contributor.authorBialousz, Kennethen
dc.contributor.authorKokal, Kevinen
dc.contributor.authorOrleans-Pobee, Kwaminaen
dc.contributor.authorWakeley, Christopheren
dc.date.accessioned2014-12-13T19:40:45Zen
dc.date.available2014-12-13T19:40:45Zen
dc.date.issued2014-12en
dc.descriptionBoth PDF and Word versions for the final report, a ZIP file of source code, and a PDF and PowerPoint of the final presentation.en
dc.description.abstractCS4984 is a newly-offered class at Virginia Tech with a unit based, project-problem based learning curriculum. This class style is based on NSF-funded work on curriculum for the field of digital libraries and related topics, and in this class, is used to guide a student based investigation of computational linguistics. The specific problem this report addresses is the creation of a means to automatically generate a short summary of a corpus of articles about earthquakes. Such a summary should be best representative of the texts and include all relevant information about earthquakes. For our analysis, we operated on two corpora--one about a 5.8 magnitude earthquake in Virginia in August 2011, and another about a 6.6 magnitude earthquake in April 2013 in Lushan, China. Techniques used to analyze the articles include clustering, lemmatization, frequency analysis of n-grams, and regular expression searches.en
dc.description.sponsorshipNSF DUE-1141209 and IIS-1319578en
dc.identifier.urihttp://hdl.handle.net/10919/51132en
dc.language.isoenen
dc.rightsCreative Commons CC0 1.0 Universal Public Domain Dedicationen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/en
dc.subjectnatural language processingen
dc.subjectHadoopen
dc.subjectMahouten
dc.subjectLDAen
dc.subjectK-means clusteringen
dc.subjectNLTKen
dc.subjectPythonen
dc.subjectnatural language generationen
dc.subjectSolren
dc.subjectStanford NERen
dc.subjectpart-of-speech taggingen
dc.titleComputational Linguistic Analysis of Earthquake Collectionsen
dc.typeDataseten
dc.typePresentationen
dc.typeSoftwareen
dc.typeTechnical reporten

Files

Original bundle
Now showing 1 - 5 of 5
Loading...
Thumbnail Image
Name:
Computation Linguistics Final Presentation.pdf
Size:
227.5 KB
Format:
Adobe Portable Document Format
Description:
Final Presentation (PDF)
Name:
Computation Linguistics Final Presentation.pptx
Size:
141.2 KB
Format:
Microsoft Powerpoint XML
Description:
Final Presentation (PowerPoint)
Name:
FinalReport.docx
Size:
44.14 KB
Format:
Microsoft Word XML
Description:
Final Report (Word)
Loading...
Thumbnail Image
Name:
FinalReport.pdf
Size:
382.15 KB
Format:
Adobe Portable Document Format
Description:
Final Report (PDF)
Name:
Source_Code.zip
Size:
76.03 MB
Format:
Unknown data format
Description:
Source Code
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: