Fast and Accurate Graph Clustering

TR Number

Date

2025-10-28

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Graph clustering, also known as community detection, is a fundamental problem in graph analytics with applications across a wide variety of domains including bioinformatics, social media analysis, and anomaly detection. Graph clustering algorithms can be grouped into two categories: inferential and descriptive. While inferential algorithms deliver high accuracy, they do so with poor performance and poor scalability, particularly on large graphs. Conversely, descriptive algorithms are fast and scalable but significantly less accurate. In this manuscript, we focus on accelerating the performance and improving the scalability of stochastic block partitioning (SBP), a bottom-up (agglomerative) inferential graph-clustering algorithm based on sequential Bayesian inference. Like other inferential algorithms, SBP is highly accurate even on graphs with a complex community structure but does not scale well to large real-world graphs that can contain upwards of a million vertices. Our first set of contributions centers around performing data reduction via sampling as well as shared- and distributed-memory parallelization for accelerating the bottom-up formulation of SBP. We integrate these approaches into a unified, accelerated bottom-up SBP implementation, which exhibits up to 322X speedup over sequential SBP on a 1,000,000-vertex graph processed on 64 compute nodes with 256GB of memory. Next, we fundamentally re-factor SBP's bottomup approach into a top-down one and modify the previous data reduction and parallelization methodology to realize an integrated and accelerated top-down SBP algorithm. This new algorithm demonstrates up to 403X speedup over sequential SBP on real-world graphs when run on 64 compute nodes with 48 cores and 32GB of memory. Furthermore, the top-down computational approach enables SBP to process up to 4.4X larger graphs — from 862,664 vertices to 3,774,768 vertices — on the same 32GB of memory. Importantly, our approach accelerates SBP without sacrificing the algorithm's accuracy on both real-world and synthetic graph datasets.

Description

Keywords

graph clustering, community detection, stochastic block partitioning, graph analysis

Citation