Browsing by Author "Tabassum, Anika"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Collection Management Tweets Project Fall 2017Khaghani, Farnaz; Zeng, Junkai; Bhuiyan, Momen; Tabassum, Anika; Bandyopadhyay, Payel (Virginia Tech, 2018-01-17)The report included in this submission documents the work by the Collection Management Tweets (CMT) team, which is a part of the bigger effort in CS5604 on building a state-of-the-art information retrieval and analysis system for the IDEAL (Integrated Digital Event Archiving and Library) and GETAR (Global Event and Trend Archive Research) projects. The mission of the CMT team had two parts: 1) Cleaning 6.2 million tweets from two 2017 event collections named "Solar Eclipse" and "Las Vegas Shooting", and loading them into HBase, an open source, non-relational, distributed database that runs on the Hadoop distributed file system, in support of further use; and 2) Building and storing a social network for the tweet data using a triple-store. For the first part, our work included: A) Making use of the work done by the previous year's class group, where incremental update was done, to introduce a faster development process of data collection and storing; B) Improving the performance of work done by the group from last year. Previously, the cleaning part, e.g., removing profanity words, plus extracting hashtags and mentions, utilized Python. This becomes very slow when the dataset scales up. We introduced parallelization in our tweet cleaning process with the help of Scala and the Hadoop cluster, and made use of different Natural Language Processing libraries for stop word and profanity removal; C) Along with tweet cleaning we also identified and stored Named-Entity-Recognition (NER) entries and Part-of-speech (POS) tags, with the tweets which was not done by the previous team. The cleaned data in HBase from this task is provided to the Classification team for spam detection and to the Clustering and Topic Analysis team for topic analysis. Collection Management Webpage team uses the extracted URLs from the tweets for further processing. Finally, after the data is indexed by the SOLR team, the Front-End team visualizes the tweets to users, and provides access for searching and browsing. In addition to the aforementioned tasks, our responsibilities also included building a network of tweets. This entailed doing research into the types of database that are appropriate for this graph. For storing the network, we used a triple-store database to record different types of edges and relationships in the graph. We also researched methods ascribing importance to nodes and edges in our social networks once they were constructed, and analyzed our networks using these techniques.
- Explainable and Network-based Approaches for Decision-making in Emergency ManagementTabassum, Anika (Virginia Tech, 2021-10-19)Critical Infrastructures (CIs), such as power, transportation, healthcare, etc., refer to systems, facilities, technologies, and networks vital to national security, public health, and socio-economic well-being of people. CIs play a crucial role in emergency management. For example, the recent Hurricane Ida, Texas Winter storm, colonial cyber-attack that occurred during 2021 in the US, shows the CIs are highly inter-dependent with complex interactions. Hence power system failures and shutdown of natural gas pipelines, in turn, led to debilitating impacts on communication, waste systems, public health, etc. Consider power failures during a disaster, such as a hurricane. Subject Matter Experts (SMEs) such as emergency management authorities may be interested in several decision-making tasks. Can we identify disaster phases in terms of the severity of damage from analyzing changes in power failures? Can we tell the SMEs which power grids or regions are the most affected during each disaster phase and need immediate action to recover? Answering these questions can help SMEs to respond quickly and send resources for fast recovery from damage. Can we systematically provide how the failure of different power grids may impact the whole CIs due to inter-dependencies? This can help SMEs to better prepare and mitigate the risks by improving system resiliency. In this thesis, we explore problems to efficiently operate decision-making tasks during a disaster for emergency management authorities. Our research has two primary directions, guide decision-making in resource allocation and plans to improve system resiliency. Our work is done in collaboration with the Oak Ridge National Laboratory to contribute impactful research in real-life CIs and disaster power failure data. 1. Explainable resource allocation: In contrast to the current interpretable or explainable model that provides answers to understand a model output, we view explanations as answers to guide resource allocation decision-making. In this thesis, we focus on developing a novel model and algorithm to identify disaster phases from changes in power failures. Also, pinpoint the regions which can get most affected at each disaster phase so the SMEs can send resources for fast recovery. 2. Networks for improving system resiliency: We view CIs as a large heterogeneous network with nodes as infrastructure components and dependencies as edges. Our goal is to construct a visual analytic tool and develop a domain-inspired model to identify the important components and connections to which the SMEs need to focus and better prepare to mitigate the risk of a disaster.