Clustering for Data Reduction: A Divide and Conquer Approach
dc.contributor.author | Andrews, Nicholas O. | en |
dc.contributor.author | Fox, Edward A. | en |
dc.contributor.department | Computer Science | en |
dc.date.accessioned | 2013-06-19T14:36:26Z | en |
dc.date.available | 2013-06-19T14:36:26Z | en |
dc.date.issued | 2007-10-01 | en |
dc.description.abstract | We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an order of magnitude faster than simply looking for prototypes on the entire dataset. Furthermore, our "divide and conquer" approach actually performs more accurately on datasets which are well bisected, as the greedy decisions of affinity propagation are confined to classes of already similar items. | en |
dc.format.mimetype | application/pdf | en |
dc.identifier | http://eprints.cs.vt.edu/archive/00000999/ | en |
dc.identifier.sourceurl | http://eprints.cs.vt.edu/archive/00000999/01/redux.pdf | en |
dc.identifier.trnumber | TR-07-36 | en |
dc.identifier.uri | http://hdl.handle.net/10919/19848 | en |
dc.language.iso | en | en |
dc.publisher | Department of Computer Science, Virginia Polytechnic Institute & State University | en |
dc.relation.ispartof | Computer Science Technical Reports | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Algorithms | en |
dc.subject | Data structures | en |
dc.title | Clustering for Data Reduction: A Divide and Conquer Approach | en |
dc.type | Technical report | en |
dc.type.dcmitype | Text | en |
Files
Original bundle
1 - 1 of 1