VTechWorks staff will be away for the winter holidays starting Tuesday, December 24, 2024, through Wednesday, January 1, 2025, and will not be replying to requests during this time. Thank you for your patience, and happy holidays!
 

Computational Linguistics Hurricane Group

dc.contributor.authorCrowder, Nicholasen
dc.contributor.authorNguyen, Daviden
dc.contributor.authorHsu, Andyen
dc.contributor.authorMecklenburg, Willen
dc.contributor.authorMorris, Jeffen
dc.date.accessioned2014-12-14T01:00:10Zen
dc.date.available2014-12-14T01:00:10Zen
dc.date.issued2014-12en
dc.descriptionThe report appears in Word and PDF formats as "FinalPaper". The FinalPresentation as PDF resulted from the use of Prezi; see FinalPresentationRaw.zip. The source code developed is in the "main" ZIP archive.en
dc.description.abstractThe problem-project based learning described in our presentation and report addresses automatic summarization of web content using natural language processing. Initially, we used simple techniques such as word frequencies and WordNet along with n-grams to create summaries. Further approaches became more complex due to the introduction of tools such as Mahout and k-means for topics and clustering. This finally culminated in the use of custom templates and a grammar to generate English sentences to accurately summarize a corpus. Our English summary was created using a grammar alongside regular expressions to extract information. The previous units all built up to the construction of quality regular expressions, in addition to a clean dataset, and some extra tools, such as a classifier trained on our data, as well as a part-of-speech tagger.en
dc.description.sponsorshipNSF DUE-1141209 and IIS-1319578en
dc.identifier.urihttp://hdl.handle.net/10919/51136en
dc.language.isoen_USen
dc.rightsCreative Commons CC0 1.0 Universal Public Domain Dedicationen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/en
dc.subjecthurricanesen
dc.subjectnlpen
dc.subjectlinguisticsen
dc.subjectsummaryen
dc.subjectTyphoon Haiyanen
dc.subjectnatural language processingen
dc.subjectautomatic generationen
dc.subjectautomatic summarizationen
dc.subjectHurricane Sandyen
dc.titleComputational Linguistics Hurricane Groupen
dc.typePresentationen
dc.typeSoftwareen
dc.typeTechnical reporten

Files

Original bundle
Now showing 1 - 5 of 5
Loading...
Thumbnail Image
Name:
FinalPaper.pdf
Size:
428.53 KB
Format:
Adobe Portable Document Format
Description:
Final Report (PDF)
Loading...
Thumbnail Image
Name:
FinalPresentation.pdf
Size:
3.83 MB
Format:
Adobe Portable Document Format
Description:
Project Presentation (Prezi presentation in PDF format)
Name:
FinalPaper.docx
Size:
55.86 KB
Format:
Microsoft Word XML
Description:
Final Report Raw
Name:
main.zip
Size:
28.57 KB
Format:
Unknown data format
Description:
Source Code (in ZIP format)
Name:
FinalPresenationRaw.zip
Size:
54.13 MB
Format:
Unknown data format
Description:
Project Presentation Raw (Prezi presentation in ZIP format)
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: