CS4984: Special Topics
Permanent URI for this collection
The title of the CS4984 Special Topics class can change from year to year, for example, Computational Linguistics (2014) and Big Data Text Summarization (2018), and includes a graduate section, CS5984.
Browse
Browsing CS4984: Special Topics by Subject "clustering"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Exploring the Blacksburg Community Events CollectionAntol, Stanislaw; Ayoub, Souleiman; Folgar, Carlos; Smith, Steve (2014-12)With the advent of new technology, especially the combination of smart phones and widespread Internet access, people are increasingly becoming absorbed in digital worlds – worlds that are not bounded by geography. As such, some people worry about what this means for local communities. The Virtual Town Square project is an effort to harness people's use of these kinds of social networks, but with a focus on local communities. As part of the Fall 2014 CS4984 Computational Linguistics course, we explored a collection of documents, the Blacksburg Events Collection, that were mined from the Virtual Town Square for the town of Blacksburg, Virginia. We describe our activities to summarize this collection to inform newcomers about the local community. We begin by describing the approach that we took, which consisted of first cleaning our dataset and then applying the idea of Hierarchical Clustering to our collection. The core idea is to cluster the documents of our collection into sub-clusters, then cluster those sub-clusters, and then finally do sub-clustering on the sentences of the final sub-clusters. We then choose the sentences closest to the final sentence sub-cluster centroids as our summaries. Some of the summary sentences capture very relevant information about specific events in the community, but our final results still have a fair bit of noise and are not very concise. We then discuss some of the lessons that we learned throughout the course of the project, such as the importance of good project planning and quickly iterating on actual solutions instead of just discussing the multitude of approaches that can be taken. We then provide suggestions to improve upon our approach, especially ways to clean up the final sentence summaries. The appendix also contains a Developer’s Manual that describes the included files and the final code in detail.