Browsing by Author "Chen, Feng"
Now showing 1 - 18 of 18
Results Per Page
Sort Options
- 2nd Workshop on Uncertainty Reasoning and Quantification in Decision MakingZhao, Xujiang; Zhao, Chen; Chen, Feng; Cho, Jin-Hee; Chen, Haifeng (ACM, 2023-08-06)Uncertainty reasoning and quantification play a critical role in decision making across various domains, prompting increased attention from both academia and industry. As real-world applications become more complex and data-driven, effectively handling uncertainty becomes paramount for accurate and reliable decision making. This workshop focuses on the critical topics of uncertainty reasoning and quantification in decision making. It provides a platform for experts and researchers from diverse backgrounds to exchange ideas on cutting-edge techniques and challenges in this field. The interdisciplinary nature of uncertainty reasoning and quantification, spanning artificial intelligence, machine learning, statistics, risk analysis, and decision science, will be explored. The workshop aims to address the need for robust and interpretable methods for modeling and quantifying uncertainty, fostering reasoning decision-making in various domains. Participants will have the opportunity to share research findings and practical experiences, promoting collaboration and advancing decision-making practices under uncertainty.
- Algorithms for Modeling Mass Movements and their Adoption in Social NetworksJin, Fang (Virginia Tech, 2016-08-23)Online social networks have become a staging ground for many modern movements, with the Arab Spring being the most prominent example. In an effort to understand and predict those movements, social media can be regarded as a valuable social sensor for disclosing underlying behaviors and patterns. To fully understand mass movement information propagation patterns in social networks, several problems need to be considered and addressed. Specifically, modeling mass movements that incorporate multiple spaces, a dynamic network structure, and misinformation propagation, can be exceptionally useful in understanding information propagation in social media. This dissertation explores four research problems underlying efforts to identify and track the adoption of mass movements in social media. First, how do mass movements become mobilized on Twitter, especially in a specific geographic area? Second, can we detect protest activity in social networks by observing group anomalies in graph? Third, how can we distinguish real movements from rumors or misinformation campaigns? and fourth, how can we infer the indicators of a specific type of protest, say climate related protest? A fundamental objective of this research has been to conduct a comprehensive study of how mass movement adoption functions in social networks. For example, it may cross multiple spaces, evolve with dynamic network structures, or consist of swift outbreaks or long term slowly evolving transmissions. In many cases, it may also be mixed with misinformation campaigns, either deliberate or in the form of rumors. Each of those issues requires the development of new mathematical models and algorithmic approaches such as those explored here. This work aims to facilitate advances in information propagation, group anomaly detection and misinformation distinction and, ultimately, help improve our understanding of mass movements and their adoption in social networks.
- Anomalous Information Detection in Social MediaTao, Rongrong (Virginia Tech, 2021-03-10)This dissertation focuses on identifying various types of anomalous information pattern in social media and news outlets. We focus on three types of anomalous information, including (1) media censorship in news outlets, which is information that should be published but is actually missing, (2) fake news in social media, which is unreliable information shown to the public, and (3) media propaganda in news outlets, which is trustworthy information but being over-populated. For the first problem, existing approaches on censorship detection mostly rely on monitoring posts in social media. However, media censorship in news outlets has not received nearly as much attention, mostly because it is difficult to systematically detect. The contributions of our work include: (1) a hypothesis testing framework to identify and evaluate censored clusters of keywords, (2) a near-linear-time algorithm to identify the highest scoring clusters as indicators of censorship, and (3) extensive experiments on six Latin American countries for performance evaluation. For the second problem, existing approaches studying fake news in social media primarily focus on topic-level modeling or prediction based on a set of aggregated features from a col- lection of posts. However, the credibility of various information components within the same topic can be quite different. The contributions of our work in this space include: (1) a new benchmark dataset for fake news research, (2) a cluster-based approach to improve instance- level prediction of information credibility, and (3) extensive experiments for performance evaluations. For the last problem, existing approaches to media propaganda detection primarily focus on investigating the pattern of information shared over social media or evaluation from domain experts. However, these approaches cannot be generalized to a large-scale analysis of media propaganda in news outlets. The contributions of our work include: (1) non- parametric scan statistics to identify clusters of over-populated keywords, (2) a near-linear-time algorithm to identify the highest scoring clusters as indicators of propaganda, and (3) extensive experiments on two Latin American countries for performance evaluation.
- Autonomous Cyber Defense for Resilient Cyber-Physical SystemsZhang, Qisheng (Virginia Tech, 2024-01-09)In this dissertation research, we design and analyze resilient cyber-physical systems (CPSs) under high network dynamics, adversarial attacks, and various uncertainties. We focus on three key system attributes to build resilient CPSs by developing a suite of the autonomous cyber defense mechanisms. First, we consider network adaptability to achieve the resilience of a CPS. Network adaptability represents the network ability to maintain its security and connectivity level when faced with incoming attacks. We address this by network topology adaptation. Network topology adaptation can contribute to quickly identifying and updating the network topology to confuse attacks by changing attack paths. We leverage deep reinforcement learning (DRL) to develop CPSs using network topology adaptation. Second, we consider the fault-tolerance of a CPS as another attribute to ensure system resilience. We aim to build a resilient CPS under severe resource constraints, adversarial attacks, and various uncertainties. We chose a solar sensor-based smart farm as one example of the CPS applications and develop a resource-aware monitoring system for the smart farms. We leverage DRL and uncertainty quantification using a belief theory, called Subjective Logic, to optimize critical tradeoffs between system performance and security under the contested CPS environments. Lastly, we study system resilience in terms of system recoverability. The system recoverability refers to the system's ability to recover from performance degradation or failure. In this task, we mainly focus on developing an automated intrusion response system (IRS) for CPSs. We aim to design the IRS with effective and efficient responses by reducing a false alarm rate and defense cost, respectively. Specifically, We build a lightweight IRS for an in-vehicle controller area network (CAN) bus system operating with DRL-based autonomous driving.
- ‘Beating the news’ with EMBERS: Forecasting Civil Unrest using Open Source IndicatorsRamakrishnan, Naren; Butler, Patrick; Self, Nathan; Khandpur, Rupinder P.; Saraf, Parang; Wang, Wei; Cadena, Jose; Vullikanti, Anil Kumar S.; Korkmaz, Gizem; Kuhlman, Christopher J.; Marathe, Achla; Zhao, Liang; Ting, Hua; Huang, Bert; Srinivasan, Aravind; Trinh, Khoa; Getoor, Lise; Katz, Graham; Doyle, Andy; Ackermann, Chris; Zavorin, Ilya; Ford, Jim; Summers, Kristen; Fayed, Youssef; Arredondo, Jaime; Gupta, Dipak; Mares, David; Muthia, Sathappan; Chen, Feng; Lu, Chang-Tien (2014)We describe the design, implementation, and evaluation of EMBERS, an automated, 24x7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and continue to be) evaluated by an independent T&E team (MITRE). Of note, EMBERS has successfully forecast the uptick and downtick of incidents during the June 2013 protests in Brazil. We outline the system architecture of EMBERS, individual models that leverage specific data sources, and a fusion and suppression engine that supports trading off specific evaluation criteria. EMBERS also provides an audit trail interface that enables the investigation of why specific predictions were made along with the data utilized for forecasting. Through numerous evaluations, we demonstrate the superiority of EMBERS over baserate methods and its capability to forecast significant societal happenings.
- Biosynthesis and Emission of Stress-Induced Volatile Terpenes in Roots and Leaves of Switchgrass (Panicum virgatum L.)Muchlinski, Andrew; Chen, Xinlu; Lovell, John T.; Köllner, Tobias G.; Pelot, Kyle A.; Zerbe, Philipp; Ruggiero, Meredith; Callaway, LeMar, III; Laliberte, Suzanne; Chen, Feng; Tholl, Dorothea (2019-09-19)Switchgrass (Panicum virgatum L.), a perennial C4 grass, represents an important species in natural and anthropogenic grasslands of North America. Its resilience to abiotic and biotic stress has made switchgrass a preferred bioenergy crop. However, little is known about the mechanisms of resistance of switchgrass against pathogens and herbivores. Volatile compounds such as terpenes have important activities in plant direct and indirect defense. Here, we show that switchgrass leaves emit blends of monoterpenes and sesquiterpenes upon feeding by the generalist insect herbivore Spodoptera frugiperda (fall armyworm) and in a systemic response to the treatment of roots with defense hormones. Belowground application of methyl jasmonate also induced the release of volatile terpenes from roots. To correlate the emission of terpenes with the expression and activity of their corresponding biosynthetic genes, we identified a gene family of 44 monoterpene and sesquiterpene synthases (mono-and sesqui-TPSs) of the type-a, type-b, type-g, and type-e subfamilies, of which 32 TPSs were found to be functionally active in vitro. The TPS genes are distributed over the K and N subgenomes with clusters occurring on several chromosomes. Synteny analysis revealed syntenic networks for approximately 30-40% of the switchgrass TPS genes in the genomes of Panicum hallii, Setaria italica, and Sorghum bicolor, suggesting shared TPS ancestry in the common progenitor of these grass lineages. Eighteen switchgrass TPS genes were substantially induced upon insect and hormone treatment and the enzymatic products of nine of these genes correlated with compounds of the induced volatile blends. In accordance with the emission of volatiles, TPS gene expression was induced systemically in response to belowground treatment, whereas this response was not observed upon aboveground feeding of S. frugiperda. Our results demonstrate complex above and belowground responses of induced volatile terpene metabolism in switchgrass and provide a framework for more detailed investigations of the function of terpenes in stress resistance in this monocot crop.
- Bridging the Gap between Spatial and Spectral Domains: A Unified Framework for Graph Neural NetworksChen, Zhiqian; Chen, Fanglan; Zhang, Lei; Ji, Taoran; Fu, Kaiqun; Zhao, Liang; Chen, Feng; Wu, Lingfei; Aggarwal, Charu; Lu, Chang-Tien (ACM, 2023-10)Deep learning's performance has been extensively recognized recently. Graph neural networks (GNNs) are designed to deal with graph-structural data that classical deep learning does not easily manage. Since most GNNs were created using distinct theories, direct comparisons are impossible. Prior research has primarily concentrated on categorizing existing models, with little attention paid to their intrinsic connections. The purpose of this study is to establish a unified framework that integrates GNNs based on spectral graph and approximation theory. The framework incorporates a strong integration between spatial- and spectral-based GNNs while tightly associating approaches that exist within each respective domain.
- DISCRN: A Distributed Storytelling Framework for Intelligence AnalysisShukla, Manu; Dos Santos, Ray; Chen, Feng; Lu, Chang-Tien (Department of Computer Science, Virginia Polytechnic Institute & State University, 2015)Storytelling connects entities (people, locations, organizations) using their observed relationships to establish meaningful stories among them. Extending that, spatio-temporal storytelling incorporates spatial and graph computations to enhance coherence and meaning. These computations become a bottleneck when performed sequentially as massive number of entities make space and time complexity untenable. This paper presents DISCRN, a distributed frame work for performing spatio-temporal storytelling. The framework extracts entities from microblogs and event data, and links those entities to derive stories in a distributed fashion. Performing these operations at scale allows deeper and broader analysis of storylines. This work extends an existing technique based on ConceptGraph and ConceptRank applying them in a distributed key-value pair paradigm. The novel parallelization techniques speed up the generation and filtering of storylines on massive datasets. Experiments with Twitter data and GDELT events show the effectiveness of techniques in DISCRN.
- Efficient Algorithms for Mining Large Spatio-Temporal DataChen, Feng (Virginia Tech, 2013-01-21)Knowledge discovery on spatio-temporal datasets has attracted
growing interests. Recent advances on remote sensing technology mean
that massive amounts of spatio-temporal data are being collected,
and its volume keeps increasing at an ever faster pace. It becomes
critical to design efficient algorithms for identifying novel and
meaningful patterns from massive spatio-temporal datasets. Different
from the other data sources, this data exhibits significant
space-time statistical dependence, and the assumption of i.i.d. is
no longer valid. The exact modeling of space-time dependence will
render the exponential growth of model complexity as the data size
increases. This research focuses on the construction of efficient
and effective approaches using approximate inference techniques for
three main mining tasks, including spatial outlier detection, robust
spatio-temporal prediction, and novel applications to real world
problems.
Spatial novelty patterns, or spatial outliers, are those data points
whose characteristics are markedly different from their spatial
neighbors. There are two major branches of spatial outlier detection
methodologies, which can be either global Kriging based or local
Laplacian smoothing based. The former approach requires the exact
modeling of spatial dependence, which is time extensive; and the
latter approach requires the i.i.d. assumption of the smoothed
observations, which is not statistically solid. These two approaches
are constrained to numerical data, but in real world applications we
are often faced with a variety of non-numerical data types, such as
count, binary, nominal, and ordinal. To summarize, the main research
challenges are: 1) how much spatial dependence can be eliminated via
Laplace smoothing; 2) how to effectively and efficiently detect
outliers for large numerical spatial datasets; 3) how to generalize
numerical detection methods and develop a unified outlier detection
framework suitable for large non-numerical datasets; 4) how to
achieve accurate spatial prediction even when the training data has
been contaminated by outliers; 5) how to deal with spatio-temporal
data for the preceding problems.
To address the first and second challenges, we mathematically
validated the effectiveness of Laplacian smoothing on the
elimination of spatial autocorrelations. This work provides
fundamental support for existing Laplacian smoothing based methods.
We also discovered a nontrivial side-effect of Laplacian smoothing,
which ingests additional spatial variations to the data due to
convolution effects. To capture this extra variability, we proposed
a generalized local statistical model, and designed two fast forward
and backward outlier detection methods that achieve a better balance
between computational efficiency and accuracy than most existing
methods, and are well suited to large numerical spatial datasets.
We addressed the third challenge by mapping non-numerical variables
to latent numerical variables via a link function, such as logit
function used in logistic regression, and then utilizing
error-buffer artificial variables, which follow a Student-t
distribution, to capture the large valuations caused by outliers. We
proposed a unified statistical framework, which integrates the
advantages of spatial generalized linear mixed model, robust spatial
linear model, reduced-rank dimension reduction, and Bayesian
hierarchical model. A linear-time approximate inference algorithm
was designed to infer the posterior distribution of the error-buffer
artificial variables conditioned on observations. We demonstrated
that traditional numerical outlier detection methods can be directly
applied to the estimated artificial variables for outliers
detection. To the best of our knowledge, this is the first
linear-time outlier detection algorithm that supports a variety of
spatial attribute types, such as binary, count, ordinal, and
nominal.
To address the fourth and fifth challenges, we proposed a robust
version of the Spatio-Temporal Random Effects (STRE) model, namely
the Robust STRE (R-STRE) model. The regular STRE model is a recently
proposed statistical model for large spatio-temporal data that has a
linear order time complexity, but is not best suited for
non-Gaussian and contaminated datasets. This deficiency can be
systemically addressed by increasing the robustness of the model
using heavy-tailed distributions, such as the Huber, Laplace, or
Student-t distribution to model the measurement error, instead of
the traditional Gaussian. However, the resulting R-STRE model
becomes analytical intractable, and direct application of
approximate inferences techniques still has a cubic order time
complexity. To address the computational challenge, we reformulated
the prediction problem as a maximum a posterior (MAP) problem with a
non-smooth objection function, transformed it to a equivalent
quadratic programming problem, and developed an efficient
interior-point numerical algorithm with a near linear order
complexity. This work presents the first near linear time robust
prediction approach for large spatio-temporal datasets in both
offline and online cases. - GLS-SOD: A Generalized Local Statistical Approach for Spatial Outlier DetectionChen, Feng; Lu, Chang-Tien; Boedihardjo, Arnold P. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2010-03-01)Local based approach is a major category of methods for spatial outlier detection (SOD). Currently, there is a lack of systematic analysis on the statistical properties of this framework. For example, most methods assume identical and independent normal distributions (i.i.d. normal) for the calculated local differences, but no justifications for this critical assumption have been presented. The methods’ detection performance on geostatistic data with linear or nonlinear trend is also not well studied. In addition, there is a lack of theoretical connections and empirical comparisons between local and global based SOD approaches. This paper discusses all these fundamental issues under the proposed generalized local statistical (GLS) framework. Furthermore, robust estimation and outlier detection methods are designed for the new GLS model. Extensive simulations demonstrated that the SOD method based on the GLS model significantly outperformed all existing approaches when the spatial data exhibits a linear or nonlinear trend.
- Graph Neural Networks: Techniques and ApplicationsChen, Zhiqian (Virginia Tech, 2020-08-25)Effective information analysis generally boils down to the geometry of the data represented by a graph. Typical applications include social networks, transportation networks, the spread of epidemic disease, brain's neuronal networks, gene data on biological regulatory networks, telecommunication networks, knowledge graph, which are lying on the non-Euclidean graph domain. To describe the geometric structures, graph matrices such as adjacency matrix or graph Laplacian can be employed to reveal latent patterns. This thesis focuses on the theoretical analysis of graph neural networks and the development of methods for specific applications using graph representation. Four methods are proposed, including rational neural networks for jump graph signal estimation, RemezNet for robust attribute prediction in the graph, ICNet for integrated circuit security, and CNF-Net for dynamic circuit deobfuscation. For the first method, a recent important state-of-art method is the graph convolutional networks (GCN) nicely integrate local vertex features and graph topology in the spectral domain. However, current studies suffer from drawbacks: graph CNNs rely on Chebyshev polynomial approximation which results in oscillatory approximation at jump discontinuities since Chebyshev polynomials require degree $Omega$(poly(1/$epsilon$)) to approximate a jump signal such as $|x|$. To reduce complexity, RatioanlNet is proposed to integrate rational function and neural networks for graph node level embeddings. For the second method, we propose a method for function approximation which suffers from several drawbacks: non-robustness and infeasibility issue; neural networks are incapable of extracting analytical representation; there is no study reported to integrate the superiorities of neural network and Remez. This work proposes a novel neural network model to address the above issues. Specifically, our method utilizes the characterizations of Remez to design objective functions. To avoid the infeasibility issue and deal with the non-robustness, a set of constraints are imposed inspired by the equioscillation theorem of best rational approximation. The third method proposes an approach for circuit security. Circuit obfuscation is a recently proposed defense mechanism to protect digital integrated circuits (ICs) from reverse engineering. Estimating the deobfuscation runtime is a challenging task due to the complexity and heterogeneity of graph-structured circuit, and the unknown and sophisticated mechanisms of the attackers for deobfuscation. To address the above-mentioned challenges, this work proposes the first graph-based approach that predicts the deobfuscation runtime based on graph neural networks. The fourth method proposes a representation for dynamic size of circuit graph. By analyzing SAT attack method, a conjunctive normal form (CNF) bipartite graph is utilized to characterize the complexity of this SAT problem. To overcome the difficulty in capturing the dynamic size of the CNF graph, an energy-based kernel is proposed to aggregate dynamic features.
- Interactive Web-Based Visual Analysis on Network Traffic DataJeong, Dong Hyun; Cho, Jin-Hee; Chen, Feng; Kaplan, Lance; Jøsang, Audun; Ji, Soo-Yeon (MDPI, 2022-12-28)Network traffic data analysis is important for securing our computing environment and data. However, analyzing network traffic data requires tremendous effort because of the complexity of continuously changing network traffic patterns. To assist the user in better understanding and analyzing the network traffic data, an interactive web-based visualization system is designed using multiple coordinated views, supporting a rich set of user interactions. For advancing the capability of analyzing network traffic data, feature extraction is considered along with uncertainty quantification to help the user make precise analyses. The system allows the user to perform a continuous visual analysis by requesting incrementally new subsets of data with updated visual representation. Case studies have been performed to determine the effectiveness of the system. The results from the case studies support that the system is well designed to understand network traffic data by identifying abnormal network traffic patterns.
- On Locally Linear Classification by Pairwise CouplingChen, Feng; Lu, Chang-Tien; Boedihardjo, Arnold P. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2008)Locally linear classification by pairwise coupling addresses a nonlinear classification problem by three basic phases: decompose the classes of complex concepts into linearly separable subclasses, learn a linear classifier for each pair, and combine pairwise classifiers into a single classifier. A number of methods have been proposed in this framework. However, these methods have several deficiencies: 1) lack of a systematic evaluation of the framework, 2) naive application of general clustering algorithms to generate subclasses, and 3) no valid method to estimate and optimal number of subclasses. This paper proves the equivalence between three popular combination schemas under general settings, defines several global criterion functions for measuring the goodness of subclasses, and presents a supervised greedy clustering algorithm to minimize the proposed criterion functions. Extensive experiments has also been conducted on a set of benchmark data to validate the effectiveness of the proposed techniques.
- Relational Outlier Detection: Techniques and ApplicationsLu, Yen-Cheng (Virginia Tech, 2021-06-10)Nowadays, outlier detection has attracted growing interest. Unlike typical outlier detection problems, relational outlier detection focuses on detecting abnormal patterns in datasets that contain relational implications within each data point. Furthermore, different from the traditional outlier detection that focuses on only numerical data, modern outlier detection models must be able to handle data in various types and structures. Detecting relational outliers should consider (1) Dependencies among different data types, (2) Data types that are not continuous or do not have ordinal characteristics, such as binary, categorical or multi-label, and (3) Special structures in the data. This thesis focuses on the development of relational outlier detection methods and real-world applications in datasets that contain non-numerical, mixed-type, and special structure data in three tasks, namely (1) outlier detection in mixed-type data, (2) categorical outlier detection in music genre data, and (3) outlier detection in categorized time series data. For the first task, existing solutions for mixed-type data mostly focus on computational efficiency, and their strategies are mostly heuristic driven, lacking a statistical foundation. The proposed contributions of our work include: (1) Constructing a novel unsupervised framework based on a robust generalized linear model (GLM), (2) Developing a model that is capable of capturing large variances of outliers and dependencies among mixed-type observations, and designing an approach for approximating the analytically intractable Bayesian inference, and (3) Conducting extensive experiments to validate effectiveness and efficiency. For the second task, we extended and applied the modeling strategy to a real-world problem. The existing solutions to the specific task are mostly supervised, and the traditional outlier detection methods only focus on detecting outliers by the data distributions, ignoring the input-output relation between the genres and the extracted features. The proposed contributions of our work for this task include: (1) Proposing an unsupervised outlier detection framework for music genre data, (2) Extending the GLM based model in the first task to handle categorical responses and developing an approach to approximate the analytically intractable Bayesian inference, and (3) Conducting experiments to demonstrate that the proposed method outperforms the benchmark methods. For the third task, we focused on improving the outlier detection performance in the second task by proposing a novel framework and expanded the research scope to general categorized time-series data. Existing studies have suggested a large number of methods for automatic time series classification. However, there is a lack of research focusing on detecting outliers from manually categorized time series. The proposed contributions of our work for this task include: (1) Proposing a novel semi-supervised robust outlier detection framework for categorized time-series datasets, (2) Further extending the new framework to an active learning system that takes user insights into account, and (3) Conducting a comprehensive set of experiments to demonstrate the performance of the proposed method in real-world applications.
- Spatio-Temporal Storytelling on TwitterDos Santos Jr, Raimundo F.; Shah, Sumit; Chen, Feng; Boedihardjo, Arnold P.; Butler, Patrick; Lu, Chang-Tien; Ramakrishnan, Naren (Department of Computer Science, Virginia Polytechnic Institute & State University, 2013-12-16)Social media, e.g.,Twitter, have provided us an unprecedented opportunity to observe events un-folding in real-time. The rapid pace at which situations play out on social media necessitates new tools for capturing and summarizing the spatio-temporal progression of events. This technical report describes methods for generating dynamic real-world storylines from Twitter Sources and shares the results of related experiments.
- Temporal Focus and Analyst Scrutiny: Evidence from Earnings Conference CallsZhou, Mi (Virginia Tech, 2017-03-17)Using the setting of earnings conference calls, this paper investigates the temporal focus of management presentation during those calls, i.e., the extent to which managers allocate their discussions to future firm prospects relative to past firm performance. I find a negative association between firms' past performance and the future focus of management presentation. Moreover, the association is less negative for firms with more long-term investors and is more negative for firms with high litigation risk. Additionally, I find that the temporal focus of management presentation is positively associated with that of analyst questions. I also find that managers' future focus is positively associated with the number of analysts following the firm but negatively associated with forecast quality of analyst reports (lower accuracy and higher dispersion). Finally, I find the future discussions in management presentation is positively associated with the time that analysts took to release the next quarter's forecasts.
- Uncertainty-Aware Reward-based Deep Reinforcement Learning for Intent Analysis of Social Media InformationGuo, Zhen; Zhang, Qi; An, Xinwei; Zhang, Qisheng; Josang, Audun; Kaplan, Lance M.; Chen, Feng; Jeong, Dong H.; Cho, Jin-Hee (2023-02-13)Due to various and serious adverse impacts of spreading fake news, it is often known that only people with malicious intent would propagate fake news. However, it is not necessarily true based on social science studies. Distinguishing the types of fake news spreaders based on their intent is critical because it will effectively guide how to intervene to mitigate the spread of fake news with different approaches. To this end, we propose an intent classification framework that can best identify the correct intent of fake news. We will leverage deep reinforcement learning (DRL) that can optimize the structural representation of each tweet by removing noisy words from the input sequence when appending an actor to the long short-term memory (LSTM) intent classifier. Policy gradient DRL model (e.g., REINFORCE) can lead the actor to a higher delayed reward. We also devise a new uncertainty-aware immediate reward using a subjective opinion that can explicitly deal with multidimensional uncertainty for effective decision-making. Via 600K training episodes from a fake news tweets dataset with an annotated intent class, we evaluate the performance of uncertainty-aware reward in DRL. Evaluation results demonstrate that our proposed framework efficiently reduces the number of selected words to maintain a high 95% multi-class accuracy.
- Unsupervised Spatial Event Detection in Targeted Domains with Applications to Civil Unrest ModelingZhao, Liang; Chen, Feng; Dai, Jing; Hua, Ting; Lu, Chang-Tien; Ramakrishnan, Naren (PLOS, 2014-10-28)Twitter has become a popular data source as a surrogate for monitoring and detecting events. Targeted domains such as crime, election, and social unrest require the creation of algorithms capable of detecting events pertinent to these domains. Due to the unstructured language, short-length messages, dynamics, and heterogeneity typical of Twitter data streams, it is technically difficult and labor-intensive to develop and maintain supervised learning systems. We present a novel unsupervised approach for detecting spatial events in targeted domains and illustrate this approach using one specific domain, viz. civil unrest modeling. Given a targeted domain, we propose a dynamic query expansion algorithm to iteratively expand domain-related terms, and generate a tweet homogeneous graph. An anomaly identification method is utilized to detect spatial events over this graph by jointly maximizing local modularity and spatial scan statistics. Extensive experiments conducted in 10 Latin American countries demonstrate the effectiveness of the proposed approach.