Clustering for Data Reduction: A Divide and Conquer Approach

dc.contributor.authorAndrews, Nicholas O.en
dc.contributor.authorFox, Edward A.en
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2013-06-19T14:36:26Zen
dc.date.available2013-06-19T14:36:26Zen
dc.date.issued2007-10-01en
dc.description.abstractWe consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an order of magnitude faster than simply looking for prototypes on the entire dataset. Furthermore, our "divide and conquer" approach actually performs more accurately on datasets which are well bisected, as the greedy decisions of affinity propagation are confined to classes of already similar items.en
dc.format.mimetypeapplication/pdfen
dc.identifierhttp://eprints.cs.vt.edu/archive/00000999/en
dc.identifier.sourceurlhttp://eprints.cs.vt.edu/archive/00000999/01/redux.pdfen
dc.identifier.trnumberTR-07-36en
dc.identifier.urihttp://hdl.handle.net/10919/19848en
dc.language.isoenen
dc.publisherDepartment of Computer Science, Virginia Polytechnic Institute & State Universityen
dc.relation.ispartofComputer Science Technical Reportsen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectAlgorithmsen
dc.subjectData structuresen
dc.titleClustering for Data Reduction: A Divide and Conquer Approachen
dc.typeTechnical reporten
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
redux.pdf
Size:
228.65 KB
Format:
Adobe Portable Document Format