Computational Linguistic Analysis of Earthquake Collections
dc.contributor.author | Bialousz, Kenneth | en |
dc.contributor.author | Kokal, Kevin | en |
dc.contributor.author | Orleans-Pobee, Kwamina | en |
dc.contributor.author | Wakeley, Christopher | en |
dc.date.accessioned | 2014-12-13T19:40:45Z | en |
dc.date.available | 2014-12-13T19:40:45Z | en |
dc.date.issued | 2014-12 | en |
dc.description | Both PDF and Word versions for the final report, a ZIP file of source code, and a PDF and PowerPoint of the final presentation. | en |
dc.description.abstract | CS4984 is a newly-offered class at Virginia Tech with a unit based, project-problem based learning curriculum. This class style is based on NSF-funded work on curriculum for the field of digital libraries and related topics, and in this class, is used to guide a student based investigation of computational linguistics. The specific problem this report addresses is the creation of a means to automatically generate a short summary of a corpus of articles about earthquakes. Such a summary should be best representative of the texts and include all relevant information about earthquakes. For our analysis, we operated on two corpora--one about a 5.8 magnitude earthquake in Virginia in August 2011, and another about a 6.6 magnitude earthquake in April 2013 in Lushan, China. Techniques used to analyze the articles include clustering, lemmatization, frequency analysis of n-grams, and regular expression searches. | en |
dc.description.sponsorship | NSF DUE-1141209 and IIS-1319578 | en |
dc.identifier.uri | http://hdl.handle.net/10919/51132 | en |
dc.language.iso | en | en |
dc.rights | Creative Commons CC0 1.0 Universal Public Domain Dedication | en |
dc.rights.uri | http://creativecommons.org/publicdomain/zero/1.0/ | en |
dc.subject | natural language processing | en |
dc.subject | Hadoop | en |
dc.subject | Mahout | en |
dc.subject | LDA | en |
dc.subject | K-means clustering | en |
dc.subject | NLTK | en |
dc.subject | Python | en |
dc.subject | natural language generation | en |
dc.subject | Solr | en |
dc.subject | Stanford NER | en |
dc.subject | part-of-speech tagging | en |
dc.title | Computational Linguistic Analysis of Earthquake Collections | en |
dc.type | Dataset | en |
dc.type | Presentation | en |
dc.type | Software | en |
dc.type | Technical report | en |
Files
Original bundle
1 - 5 of 5
Loading...
- Name:
- Computation Linguistics Final Presentation.pdf
- Size:
- 227.5 KB
- Format:
- Adobe Portable Document Format
- Description:
- Final Presentation (PDF)
- Name:
- Computation Linguistics Final Presentation.pptx
- Size:
- 141.2 KB
- Format:
- Microsoft Powerpoint XML
- Description:
- Final Presentation (PowerPoint)
Loading...
- Name:
- FinalReport.pdf
- Size:
- 382.15 KB
- Format:
- Adobe Portable Document Format
- Description:
- Final Report (PDF)
License bundle
1 - 1 of 1
- Name:
- license.txt
- Size:
- 1.5 KB
- Format:
- Item-specific license agreed upon to submission
- Description: