Browsing by Author "Zhao, Liang"
Now showing 1 - 11 of 11
Results Per Page
Sort Options
- ‘Beating the news’ with EMBERS: Forecasting Civil Unrest using Open Source IndicatorsRamakrishnan, Naren; Butler, Patrick; Self, Nathan; Khandpur, Rupinder P.; Saraf, Parang; Wang, Wei; Cadena, Jose; Vullikanti, Anil Kumar S.; Korkmaz, Gizem; Kuhlman, Christopher J.; Marathe, Achla; Zhao, Liang; Ting, Hua; Huang, Bert; Srinivasan, Aravind; Trinh, Khoa; Getoor, Lise; Katz, Graham; Doyle, Andy; Ackermann, Chris; Zavorin, Ilya; Ford, Jim; Summers, Kristen; Fayed, Youssef; Arredondo, Jaime; Gupta, Dipak; Mares, David; Muthia, Sathappan; Chen, Feng; Lu, Chang-Tien (2014)We describe the design, implementation, and evaluation of EMBERS, an automated, 24x7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and continue to be) evaluated by an independent T&E team (MITRE). Of note, EMBERS has successfully forecast the uptick and downtick of incidents during the June 2013 protests in Brazil. We outline the system architecture of EMBERS, individual models that leverage specific data sources, and a fusion and suppression engine that supports trading off specific evaluation criteria. EMBERS also provides an audit trail interface that enables the investigation of why specific predictions were made along with the data utilized for forecasting. Through numerous evaluations, we demonstrate the superiority of EMBERS over baserate methods and its capability to forecast significant societal happenings.
- Bridging the Gap between Spatial and Spectral Domains: A Unified Framework for Graph Neural NetworksChen, Zhiqian; Chen, Fanglan; Zhang, Lei; Ji, Taoran; Fu, Kaiqun; Zhao, Liang; Chen, Feng; Wu, Lingfei; Aggarwal, Charu; Lu, Chang-Tien (ACM, 2023-10)Deep learning's performance has been extensively recognized recently. Graph neural networks (GNNs) are designed to deal with graph-structural data that classical deep learning does not easily manage. Since most GNNs were created using distinct theories, direct comparisons are impossible. Prior research has primarily concentrated on categorizing existing models, with little attention paid to their intrinsic connections. The purpose of this study is to establish a unified framework that integrates GNNs based on spectral graph and approximation theory. The framework incorporates a strong integration between spatial- and spectral-based GNNs while tightly associating approaches that exist within each respective domain.
- Deep Graph Learning for Circuit DeobfuscationChen, Zhiqian; Zhang, Lei; Kolhe, Gaurav; Kamali, Hadi Mardani; Rafatirad, Setareh; Pudukotai Dinakarrao, Sai Manoj; Homayoun, Houman; Lu, Chang-Tien; Zhao, Liang (2021-05-24)Circuit obfuscation is a recently proposed defense mechanism to protect the intellectual property (IP) of digital integrated circuits (ICs) from reverse engineering. There have been effective schemes, such as satisfiability (SAT)-checking based attacks that can potentially decrypt obfuscated circuits, which is called deobfuscation. Deobfuscation runtime could be days or years, depending on the layouts of the obfuscated ICs. Hence, accurately pre-estimating the deobfuscation runtime within a reasonable amount of time is crucial for IC designers to optimize their defense. However, it is challenging due to (1) the complexity of graph-structured circuit; (2) the varying-size topology of obfuscated circuits; (3) requirement on efficiency for deobfuscation method. This study proposes a framework that predicts the deobfuscation runtime based on graph deep learning techniques to address the challenges mentioned above. A conjunctive normal form (CNF) bipartite graph is utilized to characterize the complexity of this SAT problem by analyzing the SAT attack method. Multi-order information of the graph matrix is designed to identify the essential features and reduce the computational cost. To overcome the difficulty in capturing the dynamic size of the CNF graph, an energy-based kernel is proposed to aggregate dynamic features into an identical vector space. Then, we designed a framework, Deep Survival Analysis with Graph (DSAG), which integrates energy-based layers and predicts runtime inspired by censored regression in survival analysis. Integrating uncensored data with censored data, the proposed model improves the standard regression significantly. DSAG is an end-to-end framework that can automatically extract the determinant features for deobfuscation runtime. Extensive experiments on benchmarks demonstrate its effectiveness and efficiency.
- Determining Relative Airport Threats from News and Social MediaKhandpur, Rupinder P.; Ji, Taoran; Ning, Yue; Zhao, Liang; Lu, Chang-Tien; Smith, Erik R.; Adams, Christopher; Ramakrishnan, Naren (AAAI, 2017)Airports are a prime target for terrorist organizations, drug traffickers, smugglers, and other nefarious groups. Traditional forms of security assessment are not real-time and often do not exist for each airport and port of entry. Thus, homeland security professionals must rely on measures of attractiveness of an airport as a target for attacks.We present an open source indicators approach, using news and social media, to conduct relative threat assessment, i.e., estimating if one airport is under greater threat than another. The three ingredients of our approach are a dynamic query expansion algorithm for tracking emerging threat-related chatter, news-Twitter reciprocity modeling for capturing interactions between social and traditional media, and a ranking scheme to provide an ordered assessment of airport threats. Case studies based on actual aviation incidents are presented.
- Fast and adaptive dynamics-on-graphs to dynamics-of-graphs translationZhang, Lei; Chen, Zhiqian; Lu, Chang-Tien; Zhao, Liang (Frontiers, 2023-11-17)Numerous networks in the real world change with time, producing dynamic graphs such as human mobility networks and brain networks. Typically, the “dynamics on graphs” (e.g., changing node attribute values) are visible, and they may be connected to and suggestive of the “dynamics of graphs” (e.g., evolution of the graph topology). Due to two fundamental obstacles, modeling and mapping between them have not been thoroughly explored: (1) the difficulty of developing a highly adaptable model without solid hypotheses and (2) the ineffectiveness and slowness of processing data with varying granularity. To solve these issues, we offer a novel scalable deep echo-state graph dynamics encoder for networks with significant temporal duration and dimensions. A novel neural architecture search (NAS) technique is then proposed and tailored for the deep echo-state encoder to ensure strong learnability. Extensive experiments on synthetic and actual application data illustrate the proposed method's exceptional effectiveness and efficiency.
- A Framework for Discovering Evolving Domain Related Spatio-Temporal Patterns in TwitterShi, Yan; Deng, Min; Yang, Xuexi; Liu, Qiliang; Zhao, Liang; Lu, Chang-Tien (MDPI, 2016-10-18)In massive Twitter datasets, tweets deriving from different domains, e.g., civil unrest, can be extracted to constitute spatio-temporal Twitter events for spatio-temporal distribution pattern detection. Existing algorithms generally employ scan statistics to detect spatio-temporal hotspots from Twitter events and do not consider the spatio-temporal evolving process of Twitter events. In this paper, a framework is proposed to discover evolving domain related spatio-temporal patterns from Twitter data. Given a target domain, a dynamic query expansion is employed to extract related tweets to form spatio-temporal Twitter events. The new spatial clustering approach proposed here is based on the use of multi-level constrained Delaunay triangulation to capture the spatial distribution patterns of Twitter events. An additional spatio-temporal clustering process is then performed to reveal spatio-temporal clusters and outliers that are evolving into spatial distribution patterns. Extensive experiments on Twitter datasets related to an outbreak of civil unrest in Mexico demonstrate the effectiveness and practicability of the new method. The proposed method will be helpful to accurately predict the spatio-temporal evolution process of Twitter events, which belongs to a deeper geographical analysis of spatio-temporal Big Data.
- Integrated Predictive Modeling and Analytics for Crisis ManagementAlhamadani, Abdulaziz Abdulrhman (Virginia Tech, 2024-05-15)The surge in the application of big data and predictive analytics in fields of crisis management, such as pandemics and epidemics, highlights the vital need for advanced research in these areas, particularly in the wake of the COVID-19 pandemic. Traditional methods, which typically rely on historical data to forecast future trends, fall short in addressing the complex and ever-changing nature of challenges like pandemics and public health crises. This inadequacy is further underscored by the pandemic's significant impact on various sectors, notably healthcare, government, and the hotel industry. Current models often overlook key factors such as static spatial elements, socioeconomic conditions, and the wealth of data available from social media, which are crucial for a comprehensive understanding and effective response to these multifaceted crises. This thesis employs spatial forecasting and predictive analytics to address crisis management in several distinct but interrelated contexts: the COVID-19 pandemic, the opioid crisis, and the impact of the pandemic on the hotel industry. The first part of the study focuses on using big data analytics to explore the relationship between socioeconomic factors and the spread of COVID-19 at the zip code level, aiming to predict high-risk areas for infection. The second part delves into the opioid crisis, utilizing semi-supervised deep learning techniques to monitor and categorize drug-related discussions on Reddit. The third part concentrates on developing spatial forecasting and providing explanations of the rising epidemic of drug overdose fatalities. The fourth part of the study extends to the realm of the hotel industry, aiming to optimize customer experience by analyzing online reviews and employing a localized Large Language Model to generate future customer trends and scenarios. Across these studies, the thesis aims to provide actionable insights and comprehensive solutions for effectively managing these major crises. For the first work, the majority of current research in pandemic modeling primarily relies on historical data to predict dynamic trends such as COVID-19. This work makes the following contributions in spatial COVID-19 pandemic forecasting: 1) the development of a unique model solely employing a wide range of socioeconomic indicators to forecast areas most susceptible to COVID-19, using detailed static spatial analysis, 2) identification of the most and least influential socioeconomic variables affecting COVID-19 transmission within communities, 3) construction of a comprehensive dataset that merges state-level COVID-19 statistics with corresponding socioeconomic attributes, organized by zip code. For the second work, we make the following contributions in detecting drug Abuse crisis via social media: 1) enhancing the Dynamic Query Expansion (DQE) algorithm to dynamically detect and extract evolving drug names in Reddit comments, utilizing a list curated from government and healthcare agencies, 2) constructing a textual Graph Convolutional Network combined with word embeddings to achieve fine-grained drug abuse classification in Reddit comments, identifying seven specific drug classes for the first time, 3) conducting extensive experiments to validate the framework, outperforming six baseline models in drug abuse classification and demonstrating effectiveness across multiple types of embeddings. The third study focuses on developing spatial forecasting and providing explanations of the escalating epidemic of drug overdose fatalities. Current research in this field has shown a deficiency in comprehensive explanations of the crisis, spatial analyses, and predictions of high-risk zones for drug overdoses. Addressing these gaps, this study contributes in several key areas: 1) Establishing a framework for spatially forecasting drug overdose fatalities predominantly affecting U.S. counties, 2) Proposing solutions for dealing with scarce and heterogeneous data sets, 3) Developing an algorithm that offers clear and actionable insights into the crisis, and 4) Conducting extensive experiments to validate the effectiveness of our proposed framework. In the fourth study, we address the profound impact of the pandemic on the hotel industry, focusing on the optimization of customer experience. Traditional methodologies in this realm have predominantly relied on survey data and limited segments of social media analytics. Those methods are informative but fall short of providing a full picture due to their inability to include diverse perspectives and broader customer feedback. Our study aims to make the following contributions: 1) the development of an integrated platform that distinguishes and extracts positive and negative Memorable Experiences (MEs) from online customer reviews within the hotel industry, 2) The incorporation of an advanced analytical module that performs temporal trend analysis of MEs, utilizing sophisticated data mining algorithms to dissect customer feedback on a monthly and yearly scale, 3) the implementation of an advanced tool that generates prospective and unexplored Memorable Experiences (MEs) by utilizing a localized Large Language Model (LLM) with keywords extracted from authentic customer experiences to aid hotel management in preparing for future customer trends and scenarios. Building on the integrated predictive modeling approaches developed in the earlier parts of this dissertation, this final section explores the significant impacts of the COVID-19 pandemic on the airline industry. The pandemic has precipitated substantial financial losses and operational disruptions, necessitating innovative crisis management strategies within this sector. This study introduces a novel analytical framework, EAGLE (Enhancing Airline Groundtruth Labels and Review rating prediction), which utilizes Large Language Models (LLMs) to improve the accuracy and objectivity of customer sentiment analysis in strategic airline route planning. EAGLE leverages LLMs for zero-shot pseudo-labeling and zero-shot text classification, to enhance the processing of customer reviews without the biases of manual labeling. This approach streamlines data analysis, and refines decision-making processes which allows airlines to align route expansions with nuanced customer preferences and sentiments effectively. The comprehensive application of LLMs in this context underscores the potential of predictive analytics to transform traditional crisis management strategies by providing deeper, more actionable insights.
- Multi-tissue interactions in an integrated three-tissue organ-on-a-chip platformSkardal, Aleksander; Murphy, Sean V.; Devarasetty, Mahesh; Mead, Ivy; Kang, Hyun-Wook; Seol, Young-Joon; Zhang, Yu Shrike; Shin, Su-Ryon; Zhao, Liang; Aleman, Julio; Hall, Adam R.; Shupe, Thomas D.; Kleensang, Andre; Dokmeci, Mehmet R.; Lee, Sang Jin; Jackson, John D.; Yoo, James J.; Hartung, Thomas; Khademhosseini, Ali; Soker, Shay; Bishop, Colin E.; Atala, Anthony (Springer Nature, 2017-08-18)Many drugs have progressed through preclinical and clinical trials and have been available - for years in some cases -before being recalled by the FDA for unanticipated toxicity in humans. One reason for such poor translation from drug candidate to successful use is a lack of model systems that accurately recapitulate normal tissue function of human organs and their response to drug compounds. Moreover, tissues in the body do not exist in isolation, but reside in a highly integrated and dynamically interactive environment, in which actions in one tissue can affect other downstream tissues. Few engineered model systems, including the growing variety of organoid and organ-on-a-chip platforms, have so far reflected the interactive nature of the human body. To address this challenge, we have developed an assortment of bioengineered tissue organoids and tissue constructs that are integrated in a closed circulatory perfusion system, facilitating inter-organ responses. We describe a three-tissue organ-on-a-chip system, comprised of liver, heart, and lung, and highlight examples of inter-organ responses to drug administration. We observe drug responses that depend on inter-tissue interaction, illustrating the value of multiple tissue integration for in vitro study of both the efficacy of and side effects associated with candidate drugs.
- On Modeling Dependency Dynamics of Sequential Data: Methods and ApplicationsJi, Taoran (Virginia Tech, 2022-02-04)Information mining and knowledge learning from sequential data is a field of growing importance in both industrial and academic fields. Sequential data, which is the natural representation format of the information flow in many applications, usually carries enormous information and is able to help researchers gain insights for various tasks such as airport threat detection, cyber-attack detection, recommender system, point-of-interest (POI) prediction, and citation forecasting. This dissertation focuses on developing the methods for sequential data-driven applications and evolutionary dynamics characterization for various topics such as transit service disruption detection, early event detection on social media, technology opportunity discovery, and traffic incident impact analysis. In particular, four specific applications are studied with four proposed novel methods, including a spatiotemporal feature learning framework for transit service disruption detection, a multi-task learning framework for cybersecurity event detection, citation dynamics modeling via multi-context attentional recurrent neural networks, and traffic incident impact forecasting via hierarchical spatiotemporal graph neural networks. For the first of these methods, the existing transit service disruption detection methods usually suffer from two significant shortcomings: 1) failing to modulate the sparsity of the social media feature domain, i.e., only a few important ``particles'' are indeed related to service disruption among the massive volume of data generated every day and 2) ignoring the real-world geographical connections of transit networks as well as the semantic consistency existing in the problem space. This work makes three contributions: 1) developing a spatiotemporal learning framework for metro disruption detection using open-source data, 2) modeling semantic similarity and spatial connectivity among metro lines in feature space, and 3) developing an optimization algorithm for solving the multi-convex and non-smooth objective function efficiently. For the second of these methods, the conventional studies in cybersecurity detection suffer from the following shortcomings: 1) unable to capture weak signals generated by the cyber-attacks on small organizations or individual accounts, 2) lack of generalization of distinct types of security incidents, and 3) failing to consider the relatedness across different types of cyber-attacks in the feature domain. Three contributions are made in this work: 1) formulating the problem of social media-based cyber-attack detection into the multi-task learning framework, 2) modeling multi-type task relatedness in feature space, and 3) developing an efficient algorithm to solve the non-smooth model with inequality constraints. For the third of these methods, conventional citation forecasting methods are using the traditional temporal point process, which suffers from several drawbacks: 1) unable to predict the technological categories of citing documents and thus are incapable of technological diversity assessment, and 2) require prior domain knowledge and thus are hard to extend to different research areas. Two contributions are made in this work: 1) formulating a novel framework to provide long-term citation predictions in an end-to-end fashion by integrating the process of learning intensity function representations and the process of predicting future citations and 2) designing two novel temporal attention mechanisms to improve the model's ability to modulate complicated temporal dependencies and to allow the model to dynamically combine the observation and prediction sides during the learning process. For the fourth of these methods, the previous work treats the traffic sensor readings as the features and views the incident duration prediction as a feature-driven regression, which typically suffers from three drawbacks: 1) ignoring the existence of the road-sensor hierarchical structure in the real-world traffic network, 2) unable to learn and modulate the hidden temporal patterns in the sensor readings, and 3) lack of consideration of the spatial connectivity between arterial roads and traffic sensors. This work makes three significant contributions: 1) designing a hierarchical graph convolutional network architecture for modeling the road-sensor hierarchy, 2) proposing novel spatiotemporal attention mechanism on the sensor- and road-level features for representation learning, and 3) presenting a graph convolutional network-based method for incident representation learning via spatial connectivity modeling and traffic characteristics modulation.
- Spatio-temporal Event Detection and Forecasting in Social MediaZhao, Liang (Virginia Tech, 2016-08-01)Nowadays, knowledge discovery on social media is attracting growing interest. Social media has become more than a communication tool, effectively functioning as a social sensor for our society. This dissertation focuses on the development of methods for social media-based spatiotemporal event detection and forecasting for a variety of event topics and assumptions. Five methods are proposed, namely dynamic query expansion for event detection, a generative framework for event forecasting, multi-task learning for spatiotemporal event forecasting, multi-source spatiotemporal event forecasting, and deep learning based epidemic modeling for forecasting influenza outbreaks. For the first of these methods, existing solutions for spatiotemporal event detection are mostly supervised and lack the flexibility to handle the dynamic keywords used in social media. The contributions of this work are: (1) Develop an unsupervised framework; (2) Design a novel dynamic query expansion method; and (3) Propose an innovative local modularity spatial scan algorithm. For the second of these methods, traditional solutions are unable to capture the spatiotemporal context, model mixed-type observations, or utilize prior geographical knowledge. The contributions of this work include: (1) Propose a novel generative model for spatial event forecasting; (2) Design an effective algorithm for model parameter inference; and (3) Develop a new sequence likelihood calculation method. For the third method, traditional solutions cannot deal with spatial heterogeneity or handle the dynamics of social media data effectively. This work's contributions include: (1) Formulate a multi-task learning framework for event forecasting; (2) simultaneously model static and dynamic terms; and (3) Develop efficient parameter optimization algorithms. For the fourth method, traditional multi-source solutions typically fail to consider the geographical hierarchy or cope with incomplete data blocks among different sources. The contributions here are: (1) Design a framework for event forecasting based on hierarchical multi-source indicators; (2) Propose a robust model for geo-hierarchical feature selection; and (3) Develop an efficient algorithm for model parameter optimization. For the last method, existing work on epidemic modeling either cannot ensure timeliness, or cannot characterize the underlying epidemic propagation mechanisms. The contributions of this work include: (1) Propose a novel integrated framework for computational epidemiology and social media mining; (2) Develop a semi-supervised multilayer perceptron for mining epidemic features; and (3) Design an online training algorithm.
- Unsupervised Spatial Event Detection in Targeted Domains with Applications to Civil Unrest ModelingZhao, Liang; Chen, Feng; Dai, Jing; Hua, Ting; Lu, Chang-Tien; Ramakrishnan, Naren (PLOS, 2014-10-28)Twitter has become a popular data source as a surrogate for monitoring and detecting events. Targeted domains such as crime, election, and social unrest require the creation of algorithms capable of detecting events pertinent to these domains. Due to the unstructured language, short-length messages, dynamics, and heterogeneity typical of Twitter data streams, it is technically difficult and labor-intensive to develop and maintain supervised learning systems. We present a novel unsupervised approach for detecting spatial events in targeted domains and illustrate this approach using one specific domain, viz. civil unrest modeling. Given a targeted domain, we propose a dynamic query expansion algorithm to iteratively expand domain-related terms, and generate a tweet homogeneous graph. An anomaly identification method is utilized to detect spatial events over this graph by jointly maximizing local modularity and spatial scan statistics. Extensive experiments conducted in 10 Latin American countries demonstrate the effectiveness of the proposed approach.