Browsing by Author "Prakash, B. Aditya"
Now showing 1 - 20 of 40
Results Per Page
Sort Options
- Addressing Challenges of Modern News Agencies via Predictive Modeling, Deep Learning, and Transfer LearningKeneshloo, Yaser (Virginia Tech, 2019-07-22)Today's news agencies are moving from traditional journalism, where publishing just a few news articles per day was sufficient, to modern content generation mechanisms, which create more than thousands of news pieces every day. With the growth of these modern news agencies comes the arduous task of properly handling this massive amount of data that is generated for each news article. Therefore, news agencies are constantly seeking solutions to facilitate and automate some of the tasks that have been previously done by humans. In this dissertation, we focus on some of these problems and provide solutions for two broad problems which help a news agency to not only have a wider view of the behaviour of readers around the article but also to provide an automated tools to ease the job of editors in summarizing news articles. These two disjoint problems are aiming at improving the users' reading experience by helping the content generator to monitor and focus on poorly performing content while allow them to promote the good-performing ones. We first focus on the task of popularity prediction of news articles via a combination of regression, classification, and clustering models. We next focus on the problem of generating automated text summaries for a long news article using deep learning models. The first problem aims at helping the content developer in understanding of how a news article is performing over the long run while the second problem provides automated tools for the content developers to generate summaries for each news article.
- Algorithms for regulatory network inference and experiment planning in systems biologyPratapa, Aditya (Virginia Tech, 2020-07-17)I present novel solutions to two different classes of computational problems that arise in the study of complex cellular processes. The first problem arises in the context of planning large-scale genetic cross experiments that can be used to validate predictions of multigenic perturbations made by mathematical models. (i) I present CrossPlan, a novel methodology for systematically planning genetic crosses to make a set of target mutants from a set of source mutants. CrossPlan is based on a generic experimental workflow used in performing genetic crosses in budding yeast. CrossPlan uses an integer-linear-program (ILP) to maximize the number of target mutants that we can make under certain experimental constraints. I apply it to a comprehensive mathematical model of the protein regulatory network controlling cell division in budding yeast. (ii) I formulate several natural problems related to efficient synthesis of a target mutant from source mutants. These formulations capture experimentally-useful notions of verifiability (e.g., the need to confirm that a mutant contains mutations in the desired genes) and permissibility (e.g., the requirement that no intermediate mutants in the synthesis be inviable). I present several polynomial time or fixed-parameter tractable algorithms for optimal synthesis of a target mutant for special cases of the problem that arise in practice. The second problem I address is inferring gene regulatory networks (GRNs) from single cell transcriptomic (scRNA-seq) data. These GRNs can serve as starting points to build mathematical models. (iii) I present BEELINE, a comprehensive evaluation of state-of-the-art algorithms for inferring gene regulatory networks (GRNs) from single-cell gene expression data. The evaluations from BEELINE suggest that the area under the precision-recall curve and early precision of these algorithms are moderate. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, I present recommendations to end users of GRN inference methods. BEELINE will aid the development of gene regulatory network inference algorithms. (iv) Based on the insights gained from BEELINE, I propose a novel graph convolutional neural network (GCN) based supervised algorithm for GRN inference form single-cell gene expression data. This GCN-based model has a considerably better accuracy than existing supervised learning algorithms for GRN inference from scRNA-seq data and can infer cell-type specific regulatory networks.
- Analysis of Moving Events Using TweetsPatil, Supritha Basavaraj (Virginia Tech, 2019-07-02)The Digital Library Research Laboratory (DLRL) has collected over 3.5 billion tweets on different events for the Coordinated, Behaviorally-Aware Recovery for Transportation and Power Disruptions (CBAR-tpd), the Integrated Digital Event Archiving and Library (IDEAL), and the Global Event Trend Archive Research (GETAR) projects. The tweet collection topics include heart attack, solar eclipse, terrorism, etc. There are several collections on naturally occurring events such as hurricanes, floods, and solar eclipses. Such naturally occurring events are distributed across space and time. It would be beneficial to researchers if we can perform a spatial-temporal analysis to test some hypotheses, and to find any trends that tweets would reveal for such events. I apply an existing algorithm to detect locations from tweets by modifying it to work better with the type of datasets I work with. I use the time captured in tweets and also identify the tense of the sentences in tweets to perform the temporal analysis. I build a rule-based model for obtaining the tense of a tweet. The results from these two algorithms are merged to analyze naturally occurring moving events such as solar eclipses and hurricanes. Using the spatial-temporal information from tweets, I study if tweets can be a relevant source of information in understanding the movement of the event. I create visualizations to compare the actual path of the event with the information extracted by my algorithms. After examining the results from the analysis, I noted that Twitter can be a reliable source to identify places affected by moving events almost immediately. The locations obtained are at a more detailed level than in news-wires. We can also identify the time that an event affected a particular region by date.
- Automated Vocabulary Building for Characterizing and Forecasting Elections using Social Media AnalyticsMahendiran, Aravindan (Virginia Tech, 2014-02-12)Twitter has become a popular data source in the recent decade and garnered a significant amount of attention as a surrogate data source for many important forecasting problems. Strong correlations have been observed between Twitter indicators and real-world trends spanning elections, stock markets, book sales, and flu outbreaks. A key ingredient to all methods that use Twitter for forecasting is to agree on a domain-specific vocabulary to track the pertinent tweets, which is typically provided by subject matter experts (SMEs). The language used in Twitter drastically differs from other forms of online discourse, such as news articles and blogs. It constantly evolves over time as users adopt popular hashtags to express their opinions. Thus, the vocabulary used by forecasting algorithms needs to be dynamic in nature and should capture emerging trends over time. This thesis proposes a novel unsupervised learning algorithm that builds a dynamic vocabulary using Probabilistic Soft Logic (PSL), a framework for probabilistic reasoning over relational domains. Using eight presidential elections from Latin America, we show how our query expansion methodology improves the performance of traditional election forecasting algorithms. Through this approach we demonstrate how we can achieve close to a two-fold increase in the number of tweets retrieved for predictions and a 36.90% reduction in prediction error.
- Coping isn't for the Faint of Heart: An Investigation into the Development of Coping Strategies for Incoming Police RecruitsClifton, Stacey Anne Moore (Virginia Tech, 2020-06-18)Policing in America has lost more officers to suicides than line of duty deaths over the past four years. As the gatekeepers to the criminal justice system, the well-being of officers is critical as unhealthy police using poor coping strategies to handle their stress can lead to a multitude of negative consequences for the communities they serve, their departments, their fellow officers, and themselves. While the technology of policing is quickly advancing, the routine duties of officers remain stressful. This stress requires officers to use effective coping strategies to deal with it, but the traditional subculture of policing promotes maladaptive, rather than adaptive, coping strategies. To understand how the subculture influences police and the coping strategies they use, research must understand the socialization process of recruits entering the job. The current research seeks to understand how police recruits are socialized into the police subculture and how this affects the coping strategies they use to deal with the stressors they will confront on the job. The research analyzes how the network position of recruits influences their adoption of the police subculture and how this, in turn, affects their development of coping strategies. Recruits were surveyed three times during their academy training to examine the transitioning and socialization that occurs throughout the police academy. Results reveal that networks affect the adoption of the police subculture by recruits and this socialization process impacts the development of coping strategies by recruits. Findings highlight the need for future work to continue the longitudinal research approach to examine how the networks change once recruits complete their field training and probationary period.
- Credential Theft Powered Unauthorized Login Detection through Spatial AugmentationBurch, Zachary Campbell (Virginia Tech, 2018-10-29)Credential theft is a network intrusion vector that subverts traditional defenses of a campus network, with a malicious login being the act of an attacker using those stolen credentials to access the target network. Historically, this approach is simple for an attacker to conduct and hard for a defender to detect. Alternative mitigation strategies require an in depth view of the network hosts, an untenable proposition in a campus network. We introduce a method of spatial augmentation of login events, creating a user and source IP trajectory for each event. These location mappings, built using user wireless activity and network state information, provide features needed for login classification. From this, we design and build a real time data collection, augmentation, and classification system for generating alerts on malicious events. With a relational database for data processing and a trained weighted random forests ensemble classifier, generated alerts are both timely and few enough to allow human analyst review of all generated events. We evaluate this design for three levels of attacker ability with a defined threat model. We evaluate our approach with a proof of concept system on weeks of live data collected from the Virginia Tech campus, under an IRB approved research protocol.
- Deep Learning for Enhancing Precision MedicineOh, Min (Virginia Tech, 2021-06-07)Most medical treatments have been developed aiming at the best-on-average efficacy for large populations, resulting in treatments successful for some patients but not for others. It necessitates the need for precision medicine that tailors medical treatment to individual patients. Omics data holds comprehensive genetic information on individual variability at the molecular level and hence the potential to be translated into personalized therapy. However, the attempts to transform omics data-driven insights into clinically actionable models for individual patients have been limited. Meanwhile, advances in deep learning, one of the most promising branches of artificial intelligence, have produced unprecedented performance in various fields. Although several deep learning-based methods have been proposed to predict individual phenotypes, they have not established the state of the practice, due to instability of selected or learned features derived from extremely high dimensional data with low sample sizes, which often results in overfitted models with high variance. To overcome the limitation of omics data, recent advances in deep learning models, including representation learning models, generative models, and interpretable models, can be considered. The goal of the proposed work is to develop deep learning models that can overcome the limitation of omics data to enhance the prediction of personalized medical decisions. To achieve this, three key challenges should be addressed: 1) effectively reducing dimensions of omics data, 2) systematically augmenting omics data, and 3) improving the interpretability of omics data.
- Detecting and Mitigating Rumors in Social MediaIslam, Mohammad Raihanul (Virginia Tech, 2020-06-19)The penetration of social media today enables the rapid spread of breaking news and other developments to millions of people across the globe within hours. However, such pervasive use of social media by the general masses to receive and consume news is not without its attendant negative consequences as it also opens opportunities for nefarious elements to spread rumors or misinformation. A rumor generally refers to an interesting piece of information that is widely disseminated through a social network and whose credibility cannot be easily substantiated. A rumor can later turn out to be true or false or remain unverified. The spread of misinformation and fake news can lead to deleterious effects on users and society. The objective of the proposed research is to develop a range of machine learning methods that will effectively detect and characterize rumor veracity in social media. Since users are the primary protagonists on social media, analyzing the characteristics of information spread w.r.t. users can be effective for our purpose. For our first problem, we propose a method of computing user embeddings from underlying social networks. For our second problem, we propose a long short-term memory (LSTM) based model that can classify whether a story discussed in a thread can be categorized as a false, true, or unverified rumor. We demonstrate the utility of user features computed from the first problem to address the second problem. For our third problem, we propose a method that uses user profile information to detect rumor veracity. This method has the advantage of not requiring the underlying social network, which can be tedious to compute. For the last problem, we investigate a rumor mitigation technique that recommends fact-checking URLs to rumor debunkers, i.e., social network users who are very passionate about disseminating true news. Here, we incorporate the influence of other users on rumor debunkers in addition to their previous URL sharing history to recommend relevant fact-checking URLs.
- Distinguishing Dynamical Kinds: An Approach for Automating Scientific DiscoveryShea-Blymyer, Colin (Virginia Tech, 2019-07-02)The automation of scientific discovery has been an active research topic for many years. The promise of a formalized approach to developing and testing scientific hypotheses has attracted researchers from the sciences, machine learning, and philosophy alike. Leveraging the concept of dynamical symmetries a new paradigm is proposed for the collection of scientific knowledge, and algorithms are presented for the development of EUGENE – an automated scientific discovery tool-set. These algorithms have direct applications in model validation, time series analysis, and system identification. Further, the EUGENE tool-set provides a novel metric of dynamical similarity that would allow a system to be clustered into its dynamical regimes. This dynamical distance is sensitive to the presence of chaos, effective order, and nonlinearity. I discuss the history and background of these algorithms, provide examples of their behavior, and present their use for exploring system dynamics.
- Domain-based Frameworks and Embeddings for Dynamics over NetworksAdhikari, Bijaya (Virginia Tech, 2020-06-01)Broadly this thesis looks into network and time-series mining problems pertaining to dynamics over networks in various domains. Which locations and staff should we monitor in order to detect C. Difficile outbreaks in hospitals? How do we predict the peak intensity of the influenza incidence in an interpretable fashion? How do we infer the states of all nodes in a critical infrastructure network where failures have occurred? Leveraging domain-based information should make it is possible to answer these questions. However, several new challenges arise, such as (a) presence of more complex dynamics. The dynamics over networks that we consider are complex. For example, C. Difficile spreads via both people-to-people and surface-to-people interactions and correlations between failures in critical infrastructures go beyond the network structure and depend on the geography as well. Traditional approaches either rely on models like Susceptible Infectious (SI) and Independent Cascade (IC) which are too restrictive because they focus only on single pathways or do not incorporate the model at all, resulting in sub-optimality. (b) data sparsity. Additionally, the data sparsity still persists in this space. Specifically, it is difficult to collect the exact state of each node in the network as it is high-dimensional and difficult to directly sample from. (c) mismatch between data and process. In many situations, the underlying dynamical process is unknown or depends on a mixture of several models. In such cases, there is a mismatch between the data collected and the model representing the dynamics. For example, the weighted influenza like illness (wILI) count released by the CDC, which is meant to represent the raw fraction of total population infected by influenza, actually depends on multiple factors like the number of health-care providers reporting the number and public tendency to seek medical advice. In such cases, methods which generalize well to unobserved (or unknown) models are required. Current approaches often fail in tackling these challenges as they either rely on restrictive models, require large volume of data, and/or work only for predefined models. In this thesis, we propose to leverage domain-based frameworks, which include novel models and analysis techniques, and domain-based low dimensional representation learning to tackle the challenges mentioned above for networks and time-series mining tasks. By developing novel frameworks, we can capture the complex dynamics accurately and analyze them more efficiently. For example, to detect C. Difficile outbreaks in a hospital setting, we use a two-mode disease model to capture multiple pathways of outbreaks and discrete lattice-based optimization framework. Similarly, we propose an information theoretic framework which includes geographically correlated failures in critical infrastructure networks to infer the status of the network components. Moreover, as we use more realistic frameworks to accurately capture and analyze the mechanistic processes themselves, our approaches are effective even with sparse data. At the same time, learning low-dimensional domain-aware embeddings capture domain specific properties (like incidence-based similarity between historical influenza seasons) more efficiently from sparse data, which is useful for subsequent tasks. Similarly, since the domain-aware embeddings capture the model information directly from the data without any modeling assumptions, they generalize better to new models. Our domain-aware frameworks and embeddings enable many applications in critical domains. For example, our domain-aware frameworks for C. Difficile allows different monitoring rates for people and locations, thus detecting more than 95% of outbreaks. Similarly, our framework for product recommendation in e-commerce for queries with sparse engagement data resulted in a 34% improvement over the current Walmart.com search engine. Similarly, our novel framework leads to a near optimal algorithms, with additive approximation guarantee, for inferring network states given a partial observation of the failures in networks. Additionally, by exploiting domain-aware embeddings, we outperform non-trivial competitors by up to 40% for influenza forecasting. Similarly, domain-aware representations of subgraphs helped us outperform non-trivial baselines by up to 68% in the graph classification task. We believe our techniques will be useful for variety of other applications in many areas like social networks, urban computing, and so on.
- Dynamical Processes on Large Networks (CS Seminar Lecture Series)Prakash, B. Aditya (2012-03-23)How do contagions spread in population networks? Which group should we market to, for maximizing product penetration? Will a given YouTube video go viral? Who are the best people to vaccinate? What happens when two products compete? Any insights on these problems, involving dynamical processes on networks, promise great scientific as well as commercial value. In this talk, we present a multi-pronged attack on such research questions, which includes: (a) Theoretical results on the tipping-point behavior of fundamental models; (b) Scalable Algorithms for changing the behavior of these processes, like for immunization, marketing etc.; and (c) Empirical Studies on tera-bytes of data for developing more realistic information-diffusion models. The problems we focus on are central in surprisingly diverse areas: from cyber-security, epidemiology and public health, viral marketing to spreading of hashtags on Twitter and propagation of memes on blogs. B. Aditya Prakash (http://www.cs.cmu.edu/~badityap) is a Ph.D. student in the Computer Science Department at Carnegie Mellon University. He got his B.Tech (in CS) from the Indian Institute of Technology (IIT) - Bombay. He has published 14 refereed papers in major venues and holds two U.S. patents. His interests include Data Mining, Applied Machine Learning and Databases, with emphasis on large real-world networks and time-series. Some of the inter-disciplinary questions he investigates deal with identifying the precise role of networks in diffusion of contagion (like viruses, products, ideas). The mission of his research is to enable us to understand and eventually influence such processes for our benefit. The Computer Science Seminar Lecture Series is a collection of weekly lectures about topics at the forefront of contemporary computer science research, given by speakers knowledgeable in their field of study. These speakers come from a variety of different technical and geographic backgrounds, with many of them traveling from other universities across the globe to come here and share their knowledge. These weekly lectures were recorded with an HD video camera, edited with Apple Final Cut Pro X, and outputted in such a way that the resulting .mp4 video files were economical to store and stream utilizing the university's limited bandwidth and disk space resources.
- Efficient Spatio-Temporal Network Analytics in Epidemiological Studies using Distributed DatabasesKhan, Mohammed Saquib Akmal (Virginia Tech, 2015-01-26)Real-time Spatio-Temporal Analytics has become an integral part of Epidemiological studies. The size of the spatio-temporal data has been increasing tremendously over the years, gradually evolving into Big Data. The processing in such domains are highly data and compute intensive. High performance computing resources resources are actively being used to handle such workloads over massive datasets. This confluence of High performance computing and datasets with Big Data characteristics poses great challenges pertaining to data handling and processing. The resource management of supercomputers is in conflict with the data-intensive nature of spatio-temporal analytics. This is further exacerbated due to the fact that the data management is decoupled from the computing resources. Problems of these nature has provided great opportunities in the growth and development of tools and concepts centered around MapReduce based solutions. However, we believe that advanced relational concepts can still be employed to provide an effective solution to handle these issues and challenges. In this study, we explore distributed databases to efficiently handle spatio-temporal Big Data for epidemiological studies. We propose DiceX (Data Intensive Computational Epidemiology using supercomputers), which couples high-performance, Big Data and relational computing by embedding distributed data storage and processing engines within the supercomputer. It is characterized by scalable strategies for data ingestion, unified framework to setup and configure various processing engines, along with the ability to pause, materialize and restore images of a data session. In addition, we have successfully configured DiceX to support approximation algorithms from MADlib Analytics Library [54], primarily Count-Min Sketch or CM Sketch [33][34][35]. DiceX enables a new style of Big Data processing, which is centered around the use of clustered databases and exploits supercomputing resources. It can effectively exploit the cores, memory and compute nodes of supercomputers to scale processing of spatio-temporal queries on datasets of large volume. Thus, it provides a scalable and efficient tool for data management and processing of spatio-temporal data. Although DiceX has been designed for computational epidemiology, it can be easily extended to different data-intensive domains facing similar issues and challenges. We thank our external collaborators and members of the Network Dynamics and Simulation Science Laboratory (NDSSL) for their suggestions and comments. This work has been partially supported by DTRA CNIMS Contract HDTRA1-11-D-0016-0001, DTRA Validation Grant HDTRA1-11-1-0016, NSF - Network Science and Engineering Grant CNS-1011769, NIH and NIGMS - Models of Infectious Disease Agent Study Grant 5U01GM070694-11. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government.
- Enhancing Fault Localization with Cost AwarenessNachimuthu Nallasamy, Kanagaraj (Virginia Tech, 2019-06-24)Debugging is a challenging and time-consuming process in software life-cycle. The focus of the thesis is to improve the accuracy of existing fault localization (FL) techniques. We experimented with several source code line level features such as line commit size, line recency, and line length to arrive at a new fault localization technique. Based on our experiments, we propose a novel enhanced cost-aware fault localization (ECFL) technique by combining line length with the existing selected baseline fault localization techniques. ECFL improves the accuracy of DStar (Baseline 1), CombineFastestFL (Baseline 2), and CombineFL (Baseline 3) by locating 81%, 58%, and 30% more real faults respectively in Top-1 evaluation metric. In comparison with the baseline techniques, ECFL requires a marginal additional time (on an average, 5 seconds per bug) and data while providing a significant improvement in accuracy. The source code line features also improve the baseline fault localization techniques when ''learning to rank'' SVM machine learning approach is used to combine the features. We also provide an infrastructure to facilitate future research on combining new source code line features with other fault localization techniques.
- Enhancing Learning of RecursionHamouda, Sally Mohamed Fathy Mo (Virginia Tech, 2015-11-24)Recursion is one of the most important and hardest topics in lower division computer science courses. As it is an advanced programming skill, the best way to learn it is through targeted practice exercises. But the best practice problems are hard to grade. As a consequence, students experience only a small number of problems. The dearth of feedback to students regarding whether they understand the material compounds the difficulty of teaching and learning CS2 topics. We present a new way for teaching such programming skills. Students view examples and visualizations, then practice a wide variety of automatically assessed, small-scale programming exercises that address the sub-skills required to learn recursion. The basic recursion tutorial (RecurTutor) teaches material typically encountered in CS2 courses. The advanced recursion in binary trees tutorial (BTRecurTutor) covers advanced recursion techniques most often encountered post CS2. It provides detailed feedback on the students' programming exercise answers by performing semantic code analysis on the student's code. Experiments showed that RecurTutor supports recursion learning for CS2 level students. Students who used RecurTutor had statistically significant better grades on recursion exam questions than did students who used a typical instruction. Students who experienced RecurTutor spent statistically significant more time on solving programming exercises than students who experienced typical instruction, and came out with a statistically significant higher confidence level. As a part of our effort in enhancing recursion learning, we have analyzed about 8000 CS2 exam responses on basic recursion questions. From those we discovered a collection of frequently repeated misconceptions, which allowed us to create a draft concept inventory that can be used to measure student's learning of basic recursion skills. We analyzed about 600 binary tree recursion programming exercises from CS3 exam responses. From these we found frequently recurring misconceptions. The main goal of this work is to enhance the learning of recursion. On one side, the recursion tutorials aim to enhance student learning of this topic through addressing the main misconceptions and allow students to do enough practice. On the other side, the recursion concept inventory assesses independently student learning of recursion regardless of the instructional methods.
- Evaluating, Understanding, and Mitigating Unfairness in Recommender SystemsYao, Sirui (Virginia Tech, 2021-06-10)Recommender systems are information filtering tools that discover potential matchings between users and items and benefit both parties. This benefit can be considered a social resource that should be equitably allocated across users and items, especially in critical domains such as education and employment. Biases and unfairness in recommendations raise both ethical and legal concerns. In this dissertation, we investigate the concept of unfairness in the context of recommender systems. In particular, we study appropriate unfairness evaluation metrics, examine the relation between bias in recommender models and inequality in the underlying population, as well as propose effective unfairness mitigation approaches. We start with exploring the implication of fairness in recommendation and formulating unfairness evaluation metrics. We focus on the task of rating prediction. We identify the insufficiency of demographic parity for scenarios where the target variable is justifiably dependent on demographic features. Then we propose an alternative set of unfairness metrics that measured based on how much the average predicted ratings deviate from average true ratings. We also reduce these unfairness in matrix factorization (MF) models by explicitly adding them as penalty terms to learning objectives. Next, we target a form of unfairness in matrix factorization models observed as disparate model performance across user groups. We identify four types of biases in the training data that contribute to higher subpopulation error. Then we propose personalized regularization learning (PRL), which learns personalized regularization parameters that directly address the data biases. PRL poses the hyperparameter search problem as a secondary learning task. It enables back-propagation to learn the personalized regularization parameters by leveraging the closed-form solutions of alternating least squares (ALS) to solve MF. Furthermore, the learned parameters are interpretable and provide insights into how fairness is improved. Third, we conduct theoretical analysis on the long-term dynamics of inequality in the underlying population, in terms of the fitting between users and items. We view the task of recommendation as solving a set of classification problems through threshold policies. We mathematically formulate the transition dynamics of user-item fit in one step of recommendation. Then we prove that a system with the formulated dynamics always has at least one equilibrium, and we provide sufficient conditions for the equilibrium to be unique. We also show that, depending on the item category relationships and the recommendation policies, recommendations in one item category can reshape the user-item fit in another item category. To summarize, in this research, we examine different fairness criteria in rating prediction and recommendation, study the dynamic of interactions between recommender systems and users, and propose mitigation methods to promote fairness and equality.
- Explainable and Network-based Approaches for Decision-making in Emergency ManagementTabassum, Anika (Virginia Tech, 2021-10-19)Critical Infrastructures (CIs), such as power, transportation, healthcare, etc., refer to systems, facilities, technologies, and networks vital to national security, public health, and socio-economic well-being of people. CIs play a crucial role in emergency management. For example, the recent Hurricane Ida, Texas Winter storm, colonial cyber-attack that occurred during 2021 in the US, shows the CIs are highly inter-dependent with complex interactions. Hence power system failures and shutdown of natural gas pipelines, in turn, led to debilitating impacts on communication, waste systems, public health, etc. Consider power failures during a disaster, such as a hurricane. Subject Matter Experts (SMEs) such as emergency management authorities may be interested in several decision-making tasks. Can we identify disaster phases in terms of the severity of damage from analyzing changes in power failures? Can we tell the SMEs which power grids or regions are the most affected during each disaster phase and need immediate action to recover? Answering these questions can help SMEs to respond quickly and send resources for fast recovery from damage. Can we systematically provide how the failure of different power grids may impact the whole CIs due to inter-dependencies? This can help SMEs to better prepare and mitigate the risks by improving system resiliency. In this thesis, we explore problems to efficiently operate decision-making tasks during a disaster for emergency management authorities. Our research has two primary directions, guide decision-making in resource allocation and plans to improve system resiliency. Our work is done in collaboration with the Oak Ridge National Laboratory to contribute impactful research in real-life CIs and disaster power failure data. 1. Explainable resource allocation: In contrast to the current interpretable or explainable model that provides answers to understand a model output, we view explanations as answers to guide resource allocation decision-making. In this thesis, we focus on developing a novel model and algorithm to identify disaster phases from changes in power failures. Also, pinpoint the regions which can get most affected at each disaster phase so the SMEs can send resources for fast recovery. 2. Networks for improving system resiliency: We view CIs as a large heterogeneous network with nodes as infrastructure components and dependencies as edges. Our goal is to construct a visual analytic tool and develop a domain-inspired model to identify the important components and connections to which the SMEs need to focus and better prepare to mitigate the risk of a disaster.
- Forecasting the Flu: Designing Social Network Sensors for EpidemicsShao, Huijuan; Hossain, K.S.M. Tozammel; Wu, Hao; Khan, Maleq; Vullikanti, Anil Kumar S.; Prakash, B. Aditya; Marathe, Madhav V.; Ramakrishnan, Naren (Virginia Tech, 2016-03-08)Early detection and modeling of a contagious epidemic can provide important guidance about quelling the contagion, controlling its spread, or the effective design of countermeasures. A topic of recent interest has been to design social network sensors, i.e., identifying a small set of people who can be monitored to provide insight into the emergence of an epidemic in a larger population. We formally pose the problem of designing social network sensors for flu epidemics and identify two different objectives that could be targeted in such sensor design problems. Using the graph theoretic notion of dominators we develop an efficient and effective heuristic for forecasting epidemics at lead time. Using six city-scale datasets generated by extensive microscopic epidemiological simulations involving millions of individuals, we illustrate the practical applicability of our methods and show significant benefits (up to twenty-two days more lead time) compared to other competitors. Most importantly, we demonstrate the use of surrogates or proxies for policy makers for designing social network sensors that require from nonintrusive knowledge of people to more information on the relationship among people. The results show that the more intrusive information we obtain, the longer lead time to predict the flu outbreak up to nine days.
- Frames of Digital Blackness in the Racialized Palimpsest City: Chicago, Illinois and Johannesburg, South AfricaWoodard, Davon Teremus Trevino (Virginia Tech, 2021-08-16)The United States and South Africa, exemplars of "archsegregation," have been constituted within an arc of historical racialized delineations which began with the centering, and subsequent overrepresentation, of European maleness and whiteness as the sole definition of Man. Globally present and persistent, these racialized delineations have been localized and spatially embedded through the tools of urban planning. This arc of racialized otherness, ineffectively erased, continues to inform the racially differentiated geospatial, health, social, and economic outcomes in contemporary urban form and functions for Black communities. It is within this historical arc, and against these differentiated outcomes, that contemporary urban discourse and contestation between individuals and institutions are situated. This historical othering provides not just a racialized geo-historical contextualization, but also works to preclude the recognition of the some of the most vulnerable urban community members. As urbanists and advocates strive to co-create urban space and place with municipalities, meeting the needs of these residents is imperative. In order to meet these needs, their lived experiences, and voices must be fully recognized and engaged in the processes and programs of urban co-creation, including in digital spaces and forums. Critical to achieving recognition acknowledging and situating contemporary digital discourses between local municipalities, Black residents, and Black networks within this historically racialized arc is necessary. In doing so, explore if, and how, race, specifically Blackness, is enacted in municipal digital discourse, whether these enactments serve to advance or impede resident recognition and participation, and how Black users, as residents and social network curators, engage and respond to these municipal discursive enactments. This exploratory research is a geographically and digitally multi-sited incorporated comparison of Chicago, Illinois, and Johannesburg South Africa. Using Twitter and ethnographic data collected between December 1, 2019, and March 31, 2020, this research layers digital ethnographic mixed methods and qualitive mixed methods, including traditional ethnographic, digital ethnographic, grounded theory, social change and discourse analysis, and frame analysis to explore three research goals. First, explore the digital discursive practices and frames employed by municipalities to inform, communicate with, and engage Black communities, and, if and how, these frames are situated within a historically racialized arc. Second, identify the ways in which Black residents, in dual discursive engagements with local municipalities and their own social networks, interact and engage with the municipal frames centering on Blackness. Third, through ethnographic narratives, acknowledge the marginalized residents of the Central Business District of Johannesburg, South Africa as "agents of knowledge," with critical and valuable knowledge claims which arise from their lived experiences anchored within racialized place and space. In doing so, support the efforts of these residents in recentering the validity of their knowledge claims in the co-creation of urban place and space. Additionally, in situating the city within a historically racialized arc develop novel frameworks, the racialized palimpsest city and syndemic segregation, through which to explore contemporary urban interactions and engagements.
- Got the Flu (or Mumps)? Check the Eigenvalue!Prakash, B. Aditya; Chakrabarti, Deepayan; Faloutsos, Michalis; Valler, Nicholas; Faloutsos, Christos (Virginia Tech, 2010-03-30)For a given, arbitrary graph, what is the epidemic threshold? That is, under what conditions will a virus result in an epidemic? We provide the super-model theorem, which generalizes older results in two important, orthogonal dimensions. The theorem shows that (a) for a wide range of virus propagation models (VPM) that include all virus propagation models in standard literature (say, [8][5]), and (b) for any contact graph, the answer always depends on the first eigenvalue of the connectivity matrix. We give the proof of the theorem, arithmetic examples for popular VPMs, like flu (SIS), mumps (SIR), SIRS and more. We also show the implications of our discovery: easy (although sometimes counter-intuitive) answers to ‘what-if’ questions; easier design and evaluation of immunization policies, and significantly faster agent-based simulations. badityap@
- Greedy Inference Algorithms for Structured and Neural ModelsSun, Qing (Virginia Tech, 2018-01-18)A number of problems in Computer Vision, Natural Language Processing, and Machine Learning produce structured outputs in high-dimensional space, which makes searching for the global optimal solution extremely expensive. Thus, greedy algorithms, making trade-offs between precision and efficiency, are widely used. Unfortunately, they in general lack theoretical guarantees. In this thesis, we prove that greedy algorithms are effective and efficient to search for multiple top-scoring hypotheses from structured (neural) models: 1) Entropy estimation. We aim to find deterministic samples that are representative of Gibbs distribution via a greedy strategy. 2) Searching for a set of diverse and high-quality bounding boxes. We formulate this problem as the constrained maximization of a monotonic sub-modular function such that there exists a greedy algorithm having near-optimal guarantee. 3) Fill-in-the-blank. The goal is to generate missing words conditioned on context given an image. We extend Beam Search, a greedy algorithm applicable on unidirectional expansion, to bidirectional neural models when both past and future information have to be considered. We test our proposed approaches on a series of Computer Vision and Natural Language Processing benchmarks and show that they are effective and efficient.