On the Landscape of Graph Clustering at Scale
Files
TR Number
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Graph clustering, also known as community detection, is used to partition and analyze data across a gamut of disciplines, leading to new insights in fields like bioinformatics, networking, and cybersecurity. To keep pace with the exponential growth in collected data, much of the graph clustering research has increasingly pivoted towards developing parallel and distributed clustering algorithms. However, little work has been done to rigorously characterize such algorithms with respect to each other when using the same software stack, hardware stack, and graph dataset inputs. In this manuscript, we identify three open-source, state-of-the-art graph clustering algorithms and characterize the trade-offs between their accuracy and performance on real-world graphs. We show that the ideal choice of graph clustering algorithm depends on the (1) use case, (2) runtime requirements, and (3) accuracy requirements of the user. We provide guidelines for selecting the appropriate state-of-the-practice graph clustering algorithm and conduct a performance characterization of these algorithms through which we identify opportunities for future research in scalable and accurate graph clustering algorithms.