Browsing by Author "Chen, Liangzhe"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- Analyzing and Visualizing Disaster Phases from Social Media StreamsLin, Xiao; Chen, Liangzhe; Wood, Andrew (2012-12-11)Working under the direction of CTRNet, we developed a procedure for classifying Twitter data related to natural/man-made disasters into one of the Four Phases of Emergency Management (response, recovery, mitigation, and preparedness) as well as a multi-view system for visualizing the resulting data.
- Modeling Influence using Weak Supervision: A joint Link and Post-level AnalysisChen, Liangzhe; Prakash, B. Aditya (Department of Computer Science, Virginia Polytechnic Institute & State University, 2018-04-09)Microblogging websites, like Twitter and Weibo, are used by billions of people to create and spread information. This activity depends on various factors such as the friendship links between users, their topic interests and social influence between them. Making sense of these behaviors is very important for fully understanding and utilizing these platforms. Most prior work on modeling social-media either ignores the effect of social influence, or considers its effect only on link formation or post generation. In contrast, in this paper we propose POLIM, which jointly models the effect of influence on both link and post generation, leveraging weak supervision. We also give POLIM-FIT, an efficient parallel inference algorithm for POLIM which scales to large datasets. In our experiments on a large tweets corpus, we detect meaningful topical communities, celebrities, as well as the influence strengths patterns among them. Further, we find that there are significant portions of posts and links that are caused by influence, and this portion increases when the data focuses on a specific event. We also show that differentiating and identifying these influenced content benefits other quantitative downstream tasks as well, like predicting future tweets and link formation.
- Segmentations with Explanations for Outage AnalysisChen, Liangzhe; Muralidhar, Nikhil; Chinthavali, Supriya; Ramakrishnan, Naren; Prakash, B. Aditya (Department of Computer Science, Virginia Polytechnic Institute & State University, 2018-04-09)Recent hurricane events have caused unprecedented amounts of damage and severely threatened our public safety and economy. The most observable (and severe) impact of these hurricanes is the loss of electric power in many regions, which causes the breakdown of many public services. Understanding the power outages and how they evolve during a hurricane provide insights on how to reduce outages in the future, and how to improve the robustness of the underlying critical infrastructure systems. In this paper, we propose a novel segmentation with explanations framework to help experts understand such datasets. Our method, CUT-n-REVEAL, first finds a segmentation of the outage sequences to capture pattern changes in the sequences. We then propose a novel explanation optimization problem to find an intuitive explanation of the segmentation, that highlights the culprit of the change. Via extensive experiments, we show that our method performs consistently in multiple datasets with ground truth. We further study real county-level power outage data from several recent hurricanes (Matthew, Harvey, Irma) and show that CUT-n-REVEAL recovers important, nontrivial and actionable patterns for domain experts.
- Segmenting, Summarizing and Predicting Data SequencesChen, Liangzhe (Virginia Tech, 2018-06-19)Temporal data is ubiquitous nowadays and can be easily found in many applications. Consider the extensively studied social media website Twitter. All the information can be associated with time stamps, and thus form different types of data sequences: a sequence of feature values of users who retweet a message, a sequence of tweets from a certain user, or a sequence of the evolving friendship networks. Mining these data sequences is an important task, which reveals patterns in the sequences, and it is a very challenging task as it usually requires different techniques for different sequences. The problem becomes even more complicated when the sequences are correlated. In this dissertation, we study the following two types of data sequences, and we show how to carefully exploit within-sequence and across-sequence correlations to develop more effective and scalable algorithms. 1. Multi-dimensional value sequences: We study sequences of multi-dimensional values, where each value is associated with a time stamp. Such value sequences arise in many domains such as epidemiology (medical records), social media (keyword trends), etc. Our goals are: for individual sequences, to find a segmentation of the sequence to capture where the pattern changes; for multiple correlated sequences, to use the correlations between sequences to further improve our segmentation; and to automatically find explanations of the segmentation results. 2. Social media post sequences: Driven by applications from popular social media websites such as Twitter and Weibo, we study the modeling of social media post sequences. Our goal is to understand how the posts (like tweets) are generated and how we can gain understanding of the users behind these posts. For individual social media post sequences, we study a prediction problem to find the users' latent state changes over the sequence. For dependent post sequences, we analyze the social influence among users, and how it affects users in generating posts and links. Our models and algorithms lead to useful discoveries, and they solve real problems in Epidemiology, Social Media and Critical Infrastructure Systems. Further, most of the algorithms and frameworks we propose can be extended to solve sequence mining problems in other domains as well.
- Text Clustering Using LucidWorks and Apache MahoutChen, Liangzhe; Lin, Xiao; Wood, Andrew (2012-11-17)This module introduces algorithms and evaluation metrics for flat clustering. We focus on the usage of LucidWorks big data analysis software and Apache Mahout, an open source machine learning library in clustering of document collections with the k-means algorithm.