Finding Succinct Representations For Clusters

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

Improving the explainability of results from machine learning methods has become an important research goal. In this thesis, we have studied the problem of making clusters more interpretable using a recent approach by Davidson et al., and Sambaturu et al., based on succinct representations of clusters. Given a set of objects S, a partition of S (into clusters), and a universe T of descriptors such that each element in S is associated with a subset of descriptors, the goal is to find a representative set of descriptors for each cluster such that those sets are pairwise-disjoint and the total size of all the representatives is at most a given budget. Since this problem is NP-hard in general, Sambaturu et al. have developed a suite of approximation algorithms for the problem. We also show applications to explain clusters of genomic sequences that represent different threat levels

clustering, integer programming