|dc.description.abstract||Given a population contact network and electronic medical records of patients, how to distribute vaccines to individuals to effectively control a flu epidemic? Similarly, given the Twitter following network and tweets, how to choose the best communities/groups to stop rumors from spreading? How to find the best accounts that bridge celebrities and ordinary users? These questions are related to diffusion (aka propagation) phenomena. Diffusion can be treated as a behavior of spreading contagions (like viruses, ideas, memes, etc.) on some underlying network. It is omnipresent in areas such as social media, public health, and cyber security. Examples include diseases like flu spreading on person-to-person contact networks, memes disseminating by online adoption over online friendship networks, and malware propagating among computer networks. When a contagion spreads, network structure (like nodes/edges/groups, etc.) plays a major role in determining the outcome. For instance, a rumor, if propagated by celebrities, can go viral. Similarly, an epidemic can die out quickly, if vulnerable demographic groups are successfully targeted for vaccination.
Hence in this thesis, we aim to optimize and understand network structure better in light of diffusion. We optimize graph topologies by removing nodes/edges for controlling rumors/viruses from spreading, and gain a deeper understanding of a network in terms of diffusion by exploring how nodes group together for similar roles of dissemination. We develop several novel graph mining algorithms, with different levels of granularity (node/edge level to group/community level), from model-driven and data-driven perspectives, focusing on topics like immunization on networks, graph summarization, and community detection. In contrast to previous work, we are the first to systematically develops more realistic, implementable and data-based graph algorithms to control contagions. In addition, our thesis is also the first work to use diffusion to effectively summarize graphs and understand communities/groups of networks in a general way.
1. Model-driven. Diffusion processes are usually described using mathematical models, e.g., the Independent Cascade (IC) model in social media, and the Susceptible-Infectious-Recovered (SIR) model in epidemiology. Given such models, we propose to optimize network structure for controlling propagation (the immunization problem) in several practical and implementable settings, taking into account the presence of infections, the uncertain nature of the data and group structure of the population. We develop efficient algorithms for different interventions, such as vaccination (node removal) and quarantining (edge removal). In addition, we study the graph coarsening problem for both static and temporal networks to obtain a better understanding of relations among nodes when a contagion is propagating. We seek to get a much smaller representation of a large network, while preserving its diffusive properties.
2. Data-driven. Model-driven approaches can provide ideal results if underlying diffusion models are given. However, in many situations, diffusion processes are very complicated, and it is challenging or even impossible to pick the most suited model to describe them. In addition, rapid technological development has provided an abundance of data such as tweets and electronic medical records. Hence, in the second part of the thesis, we explore data-driven approaches for diffusion in networks, which can directly work on propagation data by relaxing modeling assumptions of diffusion. To be specific, we first develop data-driven immunization strategies to stop rumors or allocate vaccines by optimizing network topologies, using large-scale national-level diagnostic patient data with billions of flu records. Second, we propose a novel community detection problem to discover "bridge" and "celebrity" communities from social media data, and design case studies to understand roles of nodes/communities using diffusion.
Our work has many applications in multiple areas such as epidemiology, sociology and computer science. For example, our work on efficient immunization algorithms, such as data-driven immunization, can help CDC better allocate vaccines to control flu epidemics in major cities. Similarly, in social media, our work on understanding network structure using diffusion can lead to better community discovery, such as finding media accounts that can boost tweet promotions in Twitter.||en_US