Exploring the Blacksburg Community Events Collection
The Python and Bash code that we used (along with the commands in the report) for the final approach. (11.26Kb)
The collection files that were generated at various stages (e.g., removing stop words, clustering). (222.5Mb)
MetadataShow full item record
With the advent of new technology, especially the combination of smart phones and widespread Internet access, people are increasingly becoming absorbed in digital worlds – worlds that are not bounded by geography. As such, some people worry about what this means for local communities. The Virtual Town Square project is an effort to harness people's use of these kinds of social networks, but with a focus on local communities. As part of the Fall 2014 CS4984 Computational Linguistics course, we explored a collection of documents, the Blacksburg Events Collection, that were mined from the Virtual Town Square for the town of Blacksburg, Virginia. We describe our activities to summarize this collection to inform newcomers about the local community. We begin by describing the approach that we took, which consisted of first cleaning our dataset and then applying the idea of Hierarchical Clustering to our collection. The core idea is to cluster the documents of our collection into sub-clusters, then cluster those sub-clusters, and then finally do sub-clustering on the sentences of the final sub-clusters. We then choose the sentences closest to the final sentence sub-cluster centroids as our summaries. Some of the summary sentences capture very relevant information about specific events in the community, but our final results still have a fair bit of noise and are not very concise. We then discuss some of the lessons that we learned throughout the course of the project, such as the importance of good project planning and quickly iterating on actual solutions instead of just discussing the multitude of approaches that can be taken. We then provide suggestions to improve upon our approach, especially ways to clean up the final sentence summaries. The appendix also contains a Developer’s Manual that describes the included files and the final code in detail.