Approaches to the Label-Switching Problem of Classification, Based on Partition-Space Relabeling and Label-Invariant Visualization

TR Number
Date
2006-07-15
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract

In the context of interest, a method of cluster analysis is used to classify a set of units into a fixed number of classes. Simulation procedures with various conceptual foundations may be used to evaluate uncertainty, stability, or sampling error of such a classification. However simulation approaches may be subject to a label-switching problem, when a likelihood function, posterior density, or some objective function is invariant under permutation of class labels. We suggest a relabeling algorithm that maximizes a simple measure of agreement among classifications. However, it is known that effective summaries and visualization tools can be based on sample concurrence fractions, which we define as sample fractions with given pairs of units falling in the same cluster, and which are invariant under permutation of class labels. We expand the study of concurrence fractions by presenting a matrix theory, which is employed in relabeling, as well as in elaboration of visualization tools. We explore an ordination approach treating concurrence fractions as similarities between pairs of units. A matrix result supports straightforward application of the method of principal coordinates, leading to ordination plots in which Euclidean distances between pairs of units have a simple relationship to concurrence fractions. The use of concurrence fractions complements relabeling, by providing an efficient initial labeling.

Description
Keywords
Consensus matrix, label-switching, model-based clustering, Monte Carlo simulation, principal coordinates analysis, similarity and dissimilarity
Citation