Browsing by Author "Andrews, Nicholas O."
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Clustering for Data Reduction: A Divide and Conquer ApproachAndrews, Nicholas O.; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2007-10-01)We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an order of magnitude faster than simply looking for prototypes on the entire dataset. Furthermore, our "divide and conquer" approach actually performs more accurately on datasets which are well bisected, as the greedy decisions of affinity propagation are confined to classes of already similar items.
- Recent Developments in Document ClusteringAndrews, Nicholas O.; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2007-10-01)This report aims to give a brief overview of the current state of document clustering research and present recent developments in a well-organized manner. Clustering algorithms are considered with two hypothetical scenarios in mind: online query clustering with tight efficiency constraints, and offline clustering with an emphasis on accuracy. A comparative analysis of the algorithms is performed along with a table summarizing important properties, and open problems as well as directions for future research are discussed.