Computational Linguistics Hurricane Group

View/ Open
Downloads: 302
Downloads: 108
Downloads: 130
Downloads: 34
Downloads: 13
Date
2014-12Author
Crowder, Nicholas
Nguyen, David
Hsu, Andy
Mecklenburg, Will
Morris, Jeff
Metadata
Show full item recordAbstract
The problem-project based learning described in our presentation and report addresses automatic summarization of web content using natural language processing. Initially, we used simple techniques such as word frequencies and WordNet along with n-grams to create summaries. Further approaches became more complex due to the introduction of tools such as Mahout and k-means for topics and clustering. This finally culminated in the use of custom templates and a grammar to generate English sentences to accurately summarize a corpus. Our English summary was created using a grammar alongside regular expressions to extract information. The previous units all built up to the construction of quality regular expressions, in addition to a clean dataset, and some extra tools, such as a classifier trained on our data, as well as a part-of-speech tagger.
Collections
License files: