Finding Succinct Representations For Clusters

TR Number
Date
2019-07-09
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract

Improving the explainability of results from machine learning methods has become an important research goal. In this thesis, we have studied the problem of making clusters more interpretable using a recent approach by Davidson et al., and Sambaturu et al., based on succinct representations of clusters. Given a set of objects S, a partition of S (into clusters), and a universe T of descriptors such that each element in S is associated with a subset of descriptors, the goal is to find a representative set of descriptors for each cluster such that those sets are pairwise-disjoint and the total size of all the representatives is at most a given budget. Since this problem is NP-hard in general, Sambaturu et al. have developed a suite of approximation algorithms for the problem. We also show applications to explain clusters of genomic sequences that represent different threat levels

Description
Keywords
clustering, integer programming
Citation
Collections