Finding Succinct Representations For Clusters

TR Number

Date

2019-07-09

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Improving the explainability of results from machine learning methods has become an important research goal. In this thesis, we have studied the problem of making clusters more interpretable using a recent approach by Davidson et al., and Sambaturu et al., based on succinct representations of clusters. Given a set of objects S, a partition of S (into clusters), and a universe T of descriptors such that each element in S is associated with a subset of descriptors, the goal is to find a representative set of descriptors for each cluster such that those sets are pairwise-disjoint and the total size of all the representatives is at most a given budget. Since this problem is NP-hard in general, Sambaturu et al. have developed a suite of approximation algorithms for the problem. We also show applications to explain clusters of genomic sequences that represent different threat levels

Description

Keywords

clustering, integer programming

Citation

Collections