Clustering for Data Reduction: A Divide and Conquer Approach

Andrews, Nicholas O.; Fox, Edward A.

Clustering for Data Reduction: A Divide and Conquer Approach

dc.contributor.author	Andrews, Nicholas O.	en
dc.contributor.author	Fox, Edward A.	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2013-06-19T14:36:26Z	en
dc.date.available	2013-06-19T14:36:26Z	en
dc.date.issued	2007-10-01	en
dc.description.abstract	We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an order of magnitude faster than simply looking for prototypes on the entire dataset. Furthermore, our "divide and conquer" approach actually performs more accurately on datasets which are well bisected, as the greedy decisions of affinity propagation are confined to classes of already similar items.	en
dc.format.mimetype	application/pdf	en
dc.identifier	http://eprints.cs.vt.edu/archive/00000999/	en
dc.identifier.sourceurl	http://eprints.cs.vt.edu/archive/00000999/01/redux.pdf	en
dc.identifier.trnumber	TR-07-36	en
dc.identifier.uri	http://hdl.handle.net/10919/19848	en
dc.language.iso	en	en
dc.publisher	Department of Computer Science, Virginia Polytechnic Institute & State University	en
dc.relation.ispartof	Computer Science Technical Reports	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Algorithms	en
dc.subject	Data structures	en
dc.title	Clustering for Data Reduction: A Divide and Conquer Approach	en
dc.type	Technical report	en
dc.type.dcmitype	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: redux.pdf
Size:: 228.65 KB
Format:: Adobe Portable Document Format

Download

Collections

Computer Science Technical Reports