Monte Carlo validation of two genetic clustering algorithms

TR Number

Date

1993

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Polytechnic Institute and State University

Abstract

Cluster analysis refers to a type of statistical method designed to identify homogeneous groups within complex, multivariate data sets. In this study two newly developed genetic cluster analysis algorithms, GENCLUS and GENCLUS+, were validated by comparing their performance against that of three popular clustering techniques (Ward's method, K-means w/ random seeds, K-means w/Ward's centroids) and in an elaborate Monte Carlo study. Additionally, the ability of GENCLUS+ to determine the correct number of clusters was compared against that of three conventional procedures (Calinski and Harabasz, C-index, trace W). GENCLUS and GENCLUS+ achieved Rand recovery values slightly inferior to those of conventional methods. However, GENCLUS+ appeared to perform better than conventional methods in an empirical analysis, and genetic method solutions appear to possess high internal cohesion and external isolation. The mixed results are interpreted as an indication of a discrepancy between cluster theory and conventional data generation techniques.

Description

Keywords

Citation