A Genetic Algorithm Approach to Cluster Analysis

Files
TR Number
TR-98-16
Date
1998-08-01
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Computer Science, Virginia Polytechnic Institute & State University
Abstract

A common problem in the social and agricultural sciences is to find clusters in experi- mental data; the standard attack is a deterministic search terminating in a locally optimal clustering. We propose here a genetic algorithm (GA) for performing cluster analysis. GAs have been used profitably in a variety of contexts in which it is either impractical or impossible to directly solve for a globally optimal solution to complex numerical problems. In the present case, our GA clustering tech- nique attempted to maximize a variance-ratio (VR) based goodness-of-fit criterion defined in terms of external cluster isolation and internal cluster homogeneity. Although our GA-based clustering algorithm cannot guarantee to recover the cluster solution that exhibits the global maximum of this fitness function, it does explicitly work toward this goal (in marked contrast to existing clustering al- gorithms, especially hierarchical agglomerative ones such as Ward’s method). Using both constrained and unconstrained simulated datasets, Monte Carlo results showed that in some conditions the ge- netic clustering algorithm did indeed surpass the performance of conventional clustering techniques (Ward’s and K-means) in terms of an internal (VR) criterion. Suggestions for future refinement and study are offered.

Description
Keywords
Citation