On the Landscape of Graph Clustering at Scale

Loading...
Thumbnail Image

TR Number

Date

2025-06

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

Graph clustering, also known as community detection, is used to partition and analyze data across a gamut of disciplines, leading to new insights in fields like bioinformatics, networking, and cybersecurity. To keep pace with the exponential growth in collected data, much of the graph clustering research has increasingly pivoted towards developing parallel and distributed clustering algorithms. However, little work has been done to rigorously characterize such algorithms with respect to each other when using the same software stack, hardware stack, and graph dataset inputs. In this manuscript, we identify three open-source, state-of-the-art graph clustering algorithms and characterize the trade-offs between their accuracy and performance on real-world graphs. We show that the ideal choice of graph clustering algorithm depends on the (1) use case, (2) runtime requirements, and (3) accuracy requirements of the user. We provide guidelines for selecting the appropriate state-of-the-practice graph clustering algorithm and conduct a performance characterization of these algorithms through which we identify opportunities for future research in scalable and accurate graph clustering algorithms.

Description

Keywords

community detection, graph clustering, hierarchical clustering, stochastic blockmodels, Bayesian inference, asynchronous Gibbs, MPI, OpenMP

Citation