Unstable Communities in Network Ensembles
Ensembles of graphs arise naturally in many applications, for example, the temporal evolution of social contacts or computer communications, tissue-specific protein interaction networks, annual citation or co-authorship networks in a field, or a family of high-likelihood Bayesian networks inferred from systems biology data. Several techniques have been developed to analyze such ensembles. A canonical problem is that of computing communities that are persistent across the ensemble. This problem is usually formulated as one of computing dense subgraphs (communities) that are frequent, i.e., appear in many graphs in the ensemble.
In this thesis, we seek to find "unstable communities" which are the antithesis of frequent, dense subgraphs. Informally, an unstable community is a set of nodes that induces highly-varying subgraphs in the ensemble. In other words, the graphs in the ensemble disagree about the precise pairwise connections among these nodes. The primary contribution of this dissertation is to introduce the concept of unstable communities as a novel problem in the field of graph mining. Specifically, it presents three approaches to mathematically formulate the concept of unstable communities, devises algorithms for computing such communities in a given ensemble of networks, and shows the usefulness of this concept in a variety of settings.
Our first definition of unstable community relies on two parameters: the first ensures that a node set induces several different subgraphs in the ensemble and the second guarantees that each of these subgraphs occurs in a large number of graphs in the ensemble. We present two algorithms to enumerate unstable communities that match this definition. The first approach, ClustMiner, is a heuristic that transforms the problem into one of computing dense subgraphs in a single graph that summarizes the ensemble. The second approach, UCMiner, is guaranteed to enumerate all maximal unstable communities correctly. We apply both approaches to systems biology datasets to demonstrate that UCMiner is superior to ClustMiner in the sense that ClustMiner's output contains node sets that are not unstable while also missing several communities computed by UCMiner. We find several node sets that capture the uncertain connectivity of genes in relevant protein complexes, suggesting that further experiments may be required to precisely discern their interaction patterns.
Our second and third definitions of unstable community rely on a novel concept of (scaled) subgraph divergence, a formulation that uses the concept of relative entropy to measure the instability of a community. We propose another algorithm, SDMiner, that can exactly enumerate all maximal unstable communities with small (scaled) subgraph divergence. We perform extensive experiments on social network datasets to show that we can discover UCs that capture the main structural variations of the given set of networks and also provide us with interesting and relevant insights about these datasets.