Browsing by Author "Marathe, Madhav Vishnu"
Now showing 1 - 20 of 66
Results Per Page
Sort Options
- An activity-based energy demand modeling framework for buildings: A bottom-up approachSubbiah, Rajesh (Virginia Tech, 2013-05-23)Energy consumption by buildings, due to various factors such as temperature regulation, lighting, poses a threat to our environment and energy resources. In the United States, statistics reveal that commercial and residential buildings combined contribute about 40 percent of the overall energy consumption, and this figure is expected to increase. In order to manage the growing demand for energy, there is a need for energy system optimization, which would require a realistic, high-resolution energy-demand model. In this work, we investigate and model the energy consumption of buildings by taking into account physical, structural, economic, and social factors that influence energy use. We propose a novel activity based modeling framework that generates an energy demand profile on a regular basis for a given nominal day. We use this information to generate a building-level energy demand profile at highly dis-aggregated level. We then investigate the different possible uses of generated demand profiles in different What-if scenarios like urban-area planning, demand-side management, demand sensitive pricing, etc. We also provide a novel way to resolve correlational and consistency problems in the generation of individual-level and building-level "shared" activities which occur due to individuals\' interactions.
- An Algorithm for Influence Maximization and Target Set Selection for the Deterministic Linear Threshold ModelSwaminathan, Anand (Virginia Tech, 2014-07-03)The problem of influence maximization has been studied extensively with applications that include viral marketing, recommendations, and feed ranking. The optimization problem, first formulated by Kempe, Kleinberg and Tardos, is known to be NP-hard. Thus, several heuristics have been proposed to solve this problem. This thesis studies the problem of influence maximization under the deterministic linear threshold model and presents a novel heuristic for finding influential nodes in a graph with the goal of maximizing contagion spread that emanates from these influential nodes. Inputs to our algorithm include edge weights and vertex thresholds. The threshold difference greedy algorithm presented in this thesis takes into account both the edge weights as well as vertex thresholds in computing influence of a node. The threshold difference greedy algorithm is evaluated on 14 real-world networks. Results demonstrate that the new algorithm performs consistently better than the seven other heuristics that we evaluated in terms of final spread size. The threshold difference greedy algorithm has tuneable parameters which can make the algorithm run faster. As a part of the approach, the algorithm also computes the infected nodes in the graph. This eliminates the need for running simulations to determine the spread size from the influential nodes. We also study the target set selection problem with our algorithm. In this problem, the final spread size is specified and a seed (or influential) set is computed that will generate the required spread size.
- Behavior Modeling and Analytics for Urban Computing: A Synthetic Information-based ApproachParikh, Nidhi Kiranbhai (Virginia Tech, 2017-03-15)The rapid increase in urbanization poses challenges in diverse areas such as energy, transportation, pandemic planning, and disaster response. Planning for urbanization is a big challenge because cities are complex systems consisting of human populations, infrastructures, and interactions and interdependence among them. This dissertation focuses on a synthetic information-based approach for modeling human activities and behaviors for two urban science applications, epidemiology and disaster planning, and with associated analytics. Synthetic information is a data-driven approach to create a detailed, high fidelity representation of human populations, infrastructural systems and their behavioral and interaction aspects. It is used in developing large-scale simulations to model what-if scenarios and for policy making. Big cities have a large number of visitors visiting them every day. They often visit crowded areas in the city and come into contact with each other and the area residents. However, most epidemiological studies have ignored their role in spreading epidemics. We extend the synthetic population model of the Washington DC metro area to include transient populations, consisting of tourists and business travelers, along with their demographics and activities, by combining data from multiple sources. We evaluate the effect of including this population in epidemic forecasts, and the potential benefits of multiple interventions that target transients. In the next study, we model human behavior in the aftermath of the detonation of an improvised nuclear device in Washington DC. Previous studies of this scenario have mostly focused on modeling physical impact and simple behaviors like sheltering and evacuation. However, these models have focused on optimal behavior, not naturalistic behavior. In other words, prior work is focused on whether it is better to shelter-in-place or evacuate, but has not been informed by the literature on what people actually do in the aftermath of disasters. Natural human behaviors in disasters, such as looking for family members or seeking healthcare, are supported by infrastructures such as cell-phone communication and transportation systems. We model a range of behaviors such as looking for family members, evacuation, sheltering, healthcare-seeking, worry, and search and rescue and their interactions with infrastructural systems. Large-scale and complex agent-based simulations generate a large amount of data in each run of the simulation, making it hard to make sense of results. This leads us to formulate two new problems in simulation analytics. First, we develop algorithms to summarize simulation results by extracting causally-relevant state sequences - state sequences that have a measurable effect on the outcome of interest. Second, in order to develop effective interventions, it is important to understand which behaviors lead to positive and negative outcomes. It may happen that the same behavior may lead to different outcomes, depending upon the context. Hence, we develop an algorithm for contextual behavior ranking. In addition to the context mentioned in the query, our algorithm also identifies any additional context that may affect the behavioral ranking.
- A Cognitively Inspired Architecture for Wireless Sensor Networks: A Web Service Oriented Middleware for a Traffic Monitoring SystemTupe, Sameer Vijay (Virginia Tech, 2006-06-08)We describe CoSMo, a Cognitively Inspired Service and Model Architecture for situational awareness and monitoring of vehicular traffic in urban transportation systems using a network of wireless sensors. The system architecture combines (i) a cognitively inspired internal representation for analyzing and answering queries concerning the observed system and (ii) a service oriented architecture that facilitates interaction among individual modules, of the internal representation, the observed system and the user. The cognitively inspired model architecture allows one to effectively respond to deductive as well as inductive queries by combining simulation based dynamic models with traditional relational databases. On the other hand the service oriented design of interaction allows one to build flexible, extensible and scalable systems that can be deployed in practical settings. To illustrate our concepts and the novel features of our architecture, we have recently completed a prototype implementation of CoSMo. The prototype illustrates advantages of our approach over other traditional approaches for designing scalable software for situational awareness in large complex systems. The basic architecture and its prototype implementation are generic and can be applied for monitoring other complex systems. CoSMo's architecture has a number of features that distinguish cognitive systems. This includes: dynamic internal models of the observed system, inductive and deductive learning and reasoning, perception, memory and adaptation. This thesis describes the service oriented model and the associated prototype implementation. Two important contributions of this thesis include the following: The Generic Service Architecture - CoSMo's service architecture is generic and can be applied to many other application domains without much change in underlying infrastructure. Integration of emerging web technologies - Use of Web Services, UPnP, UDDI and many other emerging technologies have taken CoSMo beyond a prototype implementation and towards a real production system.
- Cognitively-inspired Architecture for Wireless Sensor Networks: A Model Driven Approach for Data Integration in a Traffic Monitoring SystemPhalak, Kashmira (Virginia Tech, 2006-06-08)We describe CoSMo, a Cognitively Inspired Service and Model Architecture for situational awareness and monitoring of vehicular traffic in urban transportation systems using a network of wireless sensors. The system architecture combines (i) a cognitively inspired internal representation for analyzing and answering queries concerning the observed system and (ii) a service oriented architecture that facilitates interaction among individual modules, of the internal representation, the observed system and the user. The cognitively inspired model architecture allows effective deductive as well as inductive reasoning by combining simulation based dynamic models for planning with traditional relational databases for knowledge and data representation. On the other hand the service oriented design of interaction allows one to build flexible, extensible and scalable systems that can be deployed in practical settings. To illustrate our concepts and the novel features of our architecture, we have recently completed a prototype implementation of CoSMo. The prototype illustrates advantages of our approach over other traditional approaches for designing scalable software for situational awareness in large complex systems. The basic architecture and its prototype implementation are generic and can be applied for monitoring other complex systems. This thesis describes the design of cognitively-inspired model architecture and its corresponding prototype. Two important contributions include the following: • The cognitively-inspired architecture: In contrast to earlier work in model driven architecture, CoSMo contains a number of cognitively inspired features, including perception, memory and learning. Apart from illustrating interesting trade-offs between computational cost (e.g. access time, memory), and correctness available to a user, it also allows users specified deductive and inductive queries. • Distributed Data Integration and Fusion: In keeping with the cognitively-inspired model-driven approach, the system allows for an efficient data fusion from heterogeneous sensors, simulation based dynamic models and databases that are continually updated with real world and simulated data. It is capable of supporting a rich class of queries.
- Computational Cost Analysis of Large-Scale Agent-Based Epidemic SimulationsKamal, Tariq (Virginia Tech, 2016-09-21)Agent-based epidemic simulation (ABES) is a powerful and realistic approach for studying the impacts of disease dynamics and complex interventions on the spread of an infection in the population. Among many ABES systems, EpiSimdemics comes closest to the popular agent-based epidemic simulation systems developed by Eubank, Longini, Ferguson, and Parker. EpiSimdemics is a general framework that can model many reaction-diffusion processes besides the Susceptible-Exposed-Infectious-Recovered (SEIR) models. This model allows the study of complex systems as they interact, thus enabling researchers to model and observe the socio-technical trends and forces. Pandemic planning at the world level requires simulation of over 6 billion agents, where each agent has a unique set of demographics, daily activities, and behaviors. Moreover, the stochastic nature of epidemic models, the uncertainty in the initial conditions, and the variability of reactions require the computation of several replicates of a simulation for a meaningful study. Given the hard timelines to respond, running many replicates (15-25) of several configurations (10-100) (of these compute-heavy simulations) can only be possible on high-performance clusters (HPC). These agent-based epidemic simulations are irregular and show poor execution performance on high-performance clusters due to the evolutionary nature of their workload, large irregular communication and load imbalance. For increased utilization of HPC clusters, the simulation needs to be scalable. Many challenges arise when improving the performance of agent-based epidemic simulations on high-performance clusters. Firstly, large-scale graph-structured computation is central to the processing of these simulations, where the star-motif quality nodes (natural graphs) create large computational imbalances and communication hotspots. Secondly, the computation is performed by classes of tasks that are separated by global synchronization. The non-overlapping computations cause idle times, which introduce the load balancing and cost estimation challenges. Thirdly, the computation is overlapped with communication, which is difficult to measure using simple methods, thus making the cost estimation very challenging. Finally, the simulations are iterative and the workload (computation and communication) may change through iterations, as a result introducing load imbalances. This dissertation focuses on developing a cost estimation model and load balancing schemes to increase the runtime efficiency of agent-based epidemic simulations on high-performance clusters. While developing the cost model and load balancing schemes, we perform the static and dynamic load analysis of such simulations. We also statically quantified the computational and communication workloads in EpiSimdemics. We designed, developed and evaluated a cost model for estimating the execution cost of large-scale parallel agent-based epidemic simulations (and more generally for all constrained producer-consumer parallel algorithms). This cost model uses computational imbalances and communication latencies, and enables the cost estimation of those applications where the computation is performed by classes of tasks, separated by synchronization. It enables the performance analysis of parallel applications by computing its execution times on a number of partitions. Our evaluations show that the model is helpful in performance prediction, resource allocation and evaluation of load balancing schemes. As part of load balancing algorithms, we adopted the Metis library for partitioning bipartite graphs. We have also developed lower-overhead custom schemes called Colocation and MetColoc. We performed an evaluation of Metis, Colocation, and MetColoc. Our analysis showed that the MetColoc schemes gives a performance similar to Metis, but with half the partitioning overhead (runtime and memory). On the other hand, the Colocation scheme achieves a similar performance to Metis on a larger number of partitions, but at extremely lower partitioning overhead. Moreover, the memory requirements of Colocation scheme does not increase as we create more partitions. We have also performed the dynamic load analysis of agent-based epidemic simulations. For this, we studied the individual and joint effects of three disease parameter (transmissiblity, infection period and incubation period). We quantified the effects using an analytical equation with separate constants for SIS, SIR and SI disease models. The metric that we have developed in this work is useful for cost estimation of constrained producer-consumer algorithms, however, it has some limitations. The applicability of the metric is application, machine and data-specific. In the future, we plan to extend the metric to increase its applicability to a larger set of machine architectures, applications, and datasets.
- Computational Framework for Uncertainty Quantification, Sensitivity Analysis and Experimental Design of Network-based Computer Simulation ModelsWu, Sichao (Virginia Tech, 2017-08-29)When capturing a real-world, networked system using a simulation model, features are usually omitted or represented by probability distributions. Verification and validation (V and V) of such models is an inherent and fundamental challenge. Central to V and V, but also to model analysis and prediction, are uncertainty quantification (UQ), sensitivity analysis (SA) and design of experiments (DOE). In addition, network-based computer simulation models, as compared with models based on ordinary and partial differential equations (ODE and PDE), typically involve a significantly larger volume of more complex data. Efficient use of such models is challenging since it requires a broad set of skills ranging from domain expertise to in-depth knowledge including modeling, programming, algorithmics, high- performance computing, statistical analysis, and optimization. On top of this, the need to support reproducible experiments necessitates complete data tracking and management. Finally, the lack of standardization of simulation model configuration formats presents an extra challenge when developing technology intended to work across models. While there are tools and frameworks that address parts of the challenges above, to the best of our knowledge, none of them accomplishes all this in a model-independent and scientifically reproducible manner. In this dissertation, we present a computational framework called GENEUS that addresses these challenges. Specifically, it incorporates (i) a standardized model configuration format, (ii) a data flow management system with digital library functions helping to ensure scientific reproducibility, and (iii) a model-independent, expandable plugin-type library for efficiently conducting UQ/SA/DOE for network-based simulation models. This framework has been applied to systems ranging from fundamental graph dynamical systems (GDSs) to large-scale socio-technical simulation models with a broad range of analyses such as UQ and parameter studies for various scenarios. Graph dynamical systems provide a theoretical framework for network-based simulation models and have been studied theoretically in this dissertation. This includes a broad range of stability and sensitivity analyses offering insights into how GDSs respond to perturbations of their key components. This stability-focused, structure-to-function theory was a motivator for the design and implementation of GENEUS. GENEUS, rooted in the framework of GDS, provides modelers, experimentalists, and research groups access to a variety of UQ/SA/DOE methods with robust and tested implementations without requiring them to necessarily have the detailed expertise in statistics, data management and computing. Even for research teams having all the skills, GENEUS can significantly increase research productivity.
- Containing Cascading Failures in Networks: Applications to Epidemics and CybersecuritySaha, Sudip (Virginia Tech, 2016-10-05)Many real word networks exhibit cascading phenomena, e.g., disease outbreaks in social contact networks, malware propagation in computer networks, failures in cyber-physical systems such as power grids. As they grow in size and complexity, their security becomes increasingly important. In this thesis, we address the problems of controlling cascading failures in various network settings. We address the cascading phenomena which are either natural (e.g., disease outbreaks) or malicious (e.g., cyber attacks). We consider the nodes of a network as being individually or collectively controlled by self-interested autonomous agents and study their strategic decisions in the presence of these failure cascades. There are many models of cascading failures which specify how a node would fail when some neighbors have failed, such as: (i) epidemic spread models in which the cascading can be viewed as a natural and stochastic process and (ii) cyber attack models where the cascade is driven by malicious intents. We present our analyses and algorithms for these models in two parts. Part I focuses on problems of controlling epidemic spread. Epidemic outbreaks are generally modeled as stochastic diffusion processes. In particular, we consider the SIS model on networks. There exist heuristic centralized approaches in the literature for containing epidemic spread in SIS/SIR models; however no rigorous performance bounds are known for these approaches. We develop algorithms with provable approximation guarantees that involve either protective intervention (e.g., vaccination) or link removal (e.g., unfriending). Our approach relies on the characterization of the SIS model in terms of the spectral radius of the network. The centralized approaches, however, are sometimes not feasible in practice. For example, targeted vaccination is often not feasible because of limited compliance to directives. This issue has been addressed in the literature by formulating game theoretic models for the containment of epidemic spread. However they generally assume simplistic propagation models or homogeneous network structures. We develop novel game formulations which rely on the spectral characterization of the SIS model. In these formulations, the failures start from a random set of nodes and propagate through the network links. Each node acts as a self-interested agent and makes strategic intervention decisions (e.g., taking vaccination). Each agent decides its strategy to optimize its payoff (modeled by some payoff function). We analyze the complexity of finding Nash equilibria (NE) and study the structure of NE for different networks in these game settings. Part II focuses on malware spread in networks. In cybersecurity literature malware spreads are often studied in the framework of ``attack graph" models. In these models, a node represents either a physical computing unit or a network configuration and an edge represents a physical or logical vulnerability dependency. A node gets compromised if a certain set of its neighbors are compromised. Attack graphs describe explicit scenarios in which a single vulnerability exploitation cascades further into the network exploiting inherent dependencies among the network components. Attack graphs are used for studying cascading effects in many cybersecurity applications, e.g., component failure in enterprise networks, botnet spreads, advanced persistent attacks. One distinct feature of cyber attack cascades is the stealthy nature of the attack moves. Also, cyber attacks are generally repeated. How to control stealthy and repeated attack cascades is an interesting problem. Dijk et. al.~cite{van2013flipit} first proposed a game framework called ``FlipIt" for reasoning about the stealthy interaction between a defender and an attacker over the control of a system resource. However, in cybersecurity applications, systems generally consists of multiple resources connected by a network. Therefore it is imperative to study the stealthy attack and defense in networked systems. We develop a generalized framework called ``FlipNet" which extends the work of Dijk et. al.~cite{van2013flipit} for network. We present analyses and algorithms for different problems in this framework. On the other hand, if the security of a system is limited to the vulnerabilities and exploitations that are known to the security community, often the objective of the system owner is to take cost-effective steps to minimize potential damage in the network. This problem has been formulated in the cybersecurity literature as hardening attack graphs. Several heuristic approaches have been shown in the litrature so far but no algorithmic analysis have been shown. We analyze the inherent vulnerability of the network and present approximation hardening algorithms.
- Critical Substation Risk Assessment and MitigationDelport, Jacques (Virginia Tech, 2018-06-01)Substations are joints in the power system that represent nodes that are vital to stable and reliable operation of the power system. They contrast the rest of the power system in that they are a dense combination of critical components causing all of them to be simultaneously vulnerable to one isolated incident: weather, attack, or other common failure modes. Undoubtedly, the loss of these vital links will have a severe impact to the to the power grid to varying degrees. This work creates a cascading model based on protection system misoperations to estimate system risk from loss-of-substation events in order to assess each substation's criticality. A continuation power flow method is utilized for estimating voltage collapse during cascades. Transient stability is included through the use of a supervised machine learning algorithm called random forests. These forests allow for fast, robust and accurate prediction of transient stability during loss-of-substation initiated cascades. Substation risk indices are incorporated into a preventative optimal power flow (OPF) to reduce the risk of critical substations. This risk-based dispatch represents an easily scalable, robust algorithm for reducing risk associated with substation losses. This new dispatch allows operators to operate at a higher cost operating point for short periods in which substations may likely be lost, such as large weather events, likely attacks, etc. and significantly reduce system risk associated with those losses. System risk is then studied considering the interaction of a power grid utility trying to protect their critical substations under a constrained budget and a potential attacker with insider information on critical substations. This is studied under a zero-sum game theoretic framework in which the utility is trying to confuse the attacker. A model is then developed to analyze how a utility may create a robust strategy of protection that cannot be heavily exploited while taking advantage of any mistakes potential attackers may make.
- Data Integration Methodologies and Services for Evaluation and Forecasting of EpidemicsDeodhar, Suruchi (Virginia Tech, 2016-05-31)Most epidemiological systems described in the literature are built for evaluation and analysis of specific diseases, such as Influenza-like-illness. The modeling environments that support these systems are implemented for specific diseases and epidemiological models. Hence they are not reusable or extendable. This thesis focuses on the design and development of an integrated analytical environment with flexible data integration methodologies and multi-level web services for evaluation and forecasting of various epidemics in different regions of the world. The environment supports analysis of epidemics based on any combination of disease, surveillance sources, epidemiological models, geographic regions and demographic factors. The environment also supports evaluation and forecasting of epidemics when various policy-level and behavioral interventions are applied, that may inhibit the spread of an epidemic. First, we describe data integration methodologies and schema design, for flexible experiment design, storage and query retrieval mechanisms related to large scale epidemic data. We describe novel techniques for data transformation, optimization, pre-computation and automation that enable flexibility, extendibility and efficiency required in different categories of query processing. Second, we describe the design and engineering of adaptable middleware platforms based on service-oriented paradigms for interactive workflow, communication, and decoupled integration. This supports large-scale multi-user applications with provision for online analysis of interventions as well as analytical processing of forecast computations. Using a service-oriented architecture, we have provided a platform-as-a-service representation for evaluation and forecasting of epidemics. We demonstrate the applicability of our integrated environment through development of the applications, DISIMS and EpiCaster. DISIMS is an interactive web-based system for evaluating the effects of dynamic intervention strategies on epidemic propagation. EpiCaster is a situation assessment and forecasting tool for projecting the state of evolving epidemics such as flu and Ebola in different regions of the world. We discuss how our platform uses existing technologies to solve a novel problem in epidemiology, and provides a unique solution on which different applications can be built for analyzing epidemic containment strategies.
- Data-Driven Methods for Modeling and Predicting Multivariate Time Series using SurrogatesChakraborty, Prithwish (Virginia Tech, 2016-07-05)Modeling and predicting multivariate time series data has been of prime interest to researchers for many decades. Traditionally, time series prediction models have focused on finding attributes that have consistent correlations with target variable(s). However, diverse surrogate signals, such as News data and Twitter chatter, are increasingly available which can provide real-time information albeit with inconsistent correlations. Intelligent use of such sources can lead to early and real-time warning systems such as Google Flu Trends. Furthermore, the target variables of interest, such as public heath surveillance, can be noisy. Thus models built for such data sources should be flexible as well as adaptable to changing correlation patterns. In this thesis we explore various methods of using surrogates to generate more reliable and timely forecasts for noisy target signals. We primarily investigate three key components of the forecasting problem viz. (i) short-term forecasting where surrogates can be employed in a now-casting framework, (ii) long-term forecasting problem where surrogates acts as forcing parameters to model system dynamics and, (iii) robust drift models that detect and exploit 'changepoints' in surrogate-target relationship to produce robust models. We explore various 'physical' and 'social' surrogate sources to study these sub-problems, primarily to generate real-time forecasts for endemic diseases. On modeling side, we employed matrix factorization and generalized linear models to detect short-term trends and explored various Bayesian sequential analysis methods to model long-term effects. Our research indicates that, in general, a combination of surrogates can lead to more robust models. Interestingly, our findings indicate that under specific scenarios, particular surrogates can decrease overall forecasting accuracy - thus providing an argument towards the use of 'Good data' against 'Big data'.
- A Database Supported Modeling Environment for Pandemic Planning and Course of Action AnalysisMa, Yifei (Virginia Tech, 2013-06-24)Pandemics can significantly impact public health and society, for instance, the 2009 H1N1
and the 2003 SARS. In addition to analyzing the historic epidemic data, computational simulation of epidemic propagation processes and disease control strategies can help us understand the spatio-temporal dynamics of epidemics in the laboratory. Consequently, the public can be better prepared and the government can control future epidemic outbreaks more effectively. Recently, epidemic propagation simulation systems, which use high performance computing technology, have been proposed and developed to understand disease propagation processes. However, run-time infection situation assessment and intervention adjustment, two important steps in modeling disease propagation, are not well supported in these simulation systems. In addition, these simulation systems are computationally efficient in their simulations, but most of them have
limited capabilities in terms of modeling interventions in realistic scenarios.
In this dissertation, we focus on building a modeling and simulation environment for epidemic propagation and propagation control strategy. The objective of this work is to
design such a modeling environment that both supports the previously missing functions,
meanwhile, performs well in terms of the expected features such as modeling fidelity,
computational efficiency, modeling capability, etc. Our proposed methodologies to build
such a modeling environment are: 1) decoupled and co-evolving models for disease propagation, situation assessment, and propagation control strategy, and 2) assessing situations and simulating control strategies using relational databases. Our motivation for exploring these methodologies is as follows: 1) a decoupled and co-evolving model allows us to design modules for each function separately and makes this complex modeling system design simpler, and 2) simulating propagation control strategies using relational databases improves the modeling capability and human productivity of using this modeling environment. To evaluate our proposed methodologies, we have designed and built a loosely coupled and database supported epidemic modeling and simulation environment. With detailed experimental results and realistic case studies, we demonstrate that our modeling environment provides the missing functions and greatly enhances many expected features, such as modeling capability, without significantly sacrificing computational efficiency and scalability. - Development of Person-Person Network and Interacting PTTS in EpiSimdemicsMishra, Gaurav (Virginia Tech, 2014-05-23)Communications over social media, telephone, email, text etc have emerged as an integral part of modern society and they are popularly used for the expression of anger, anxiety, fear, agitation and opinion by the people. People's social interaction tend to increase dramatically during periods of epidemics, protest and calamities. Therefore, above mentioned communication channels plays an important role in the spread of infectious phenomenon, like rumors, fads and effects. These infectious phenomena alters people's behavior during disease epidemic [1][2]. Social contact networks and epidemics co-evolve [1][2]. The spread of a disease influences people's behavior which in turn changes their social contact network, thereby altering the disease spread itself. As a result, there is a need for modeling the spread of these infectious phenomena that lead to changes in behavior. Their propagation among population primarily depends on the social contact network. The nature of social contagion spread is very similar to the spread of any infectious disease as they are contagious in nature. To spread contagious disease requires direct exposure to an infectious agent, whereas social contagions can be spread using various communications media like social networking forums, phones, emails and tweets. EpiSimdemics is an individual-based modeling environment. It uses a people-location bipartite graph as the underlying network [3]. In its current form, EpiSimdemics requires two people to interact at a location to model simulations. Thus, it cannot simulate the spread of social contagions that do not necessarily require the meeting of two agents at a location. We enhance EpiSimdemics by incorporating Person-Person network, which can model communications between people that are not contact based such as communications over email, phone, text and tweet. This Person-Person network is used to model effects (social contagion) which induce behavioral changes in population and thus impacting the disease spread. The disease spread is modeled on Person-Location network. This leads to the scenario of two interacting networks: Person-Person network modeling social contagion and Person-Location modeling disease. Theoretically, there can be multiple such networks modeling various interacting phenomena. We demonstrate the usefulness of this network by modeling and simulating two interacting PTTSs (probabilistic timed transition systems). To model disease epidemics, we have defined Disease Model and to model effects (social contagion), we have defined Fear Model. We show how these models influence each other by performing simulations on EpiSimdemics with interacting Disease and Fear Model. Therefore a model that does not include the affect adaptations on disease epidemics and vice-versa, fails to reflect the actual behavior of a society during disease epidemic spread. The addition of Person-Person network to EpiSimdemics will allow for a better understanding of the affect adaptions, which can include behavior changes in society during an epidemic outbreak. This would lead to effective interventions and help to better understand the dynamics of disease epidemic.
- Discrete Event Simulation of Mobility and Spatio-Temporal Spectrum DemandChandan, Shridhar (Virginia Tech, 2014-02-05)Realistic mobility and cellular traffic modeling is key to various wireless networking applications and have a significant impact on network performance. Planning and design, network resource allocation and performance evaluation in cellular networks require realistic traffic modeling. We propose a Discrete Event Simulation framework, Diamond - (Discrete Event Simulation of Mobility and Spatio-Temporal Spectrum Demand) to model and analyze realistic activity based mobility and spectrum demand patterns. The framework can be used for spatio-temporal estimation of load, in deciding location of a new base station, contingency planning, and estimating the resilience of the existing infrastructure. The novelty of this framework lies in its ability to capture a variety of complex, realistic and dynamically changing events effectively. Our initial results show that the framework can be instrumental in contingency planning and dynamic spectrum allocation.
- A Distributed Approach to EpiFast using Apache SparkKannan, Vijayasarathy (Virginia Tech, 2015-08-04)EpiFast is a parallel algorithm for large-scale epidemic simulations, based on an interpretation of the stochastic disease propagation in a contact network. The original EpiFast implementation is based on a master-slave computation model with a focus on distributed memory using message-passing-interface (MPI). However, it suffers from few shortcomings with respect to scale of networks being studied. This thesis addresses these shortcomings and provides two different implementations: Spark-EpiFast based on the Apache Spark big data processing engine and Charm-EpiFast based on the Charm++ parallel programming framework. The study focuses on exploiting features of both systems that we believe could potentially benefit in terms of performance and scalability. We present models of EpiFast specific to each system and relate algorithm specifics to several optimization techniques. We also provide a detailed analysis of these optimizations through a range of experiments that consider scale of networks and environment settings we used. Our analysis shows that the Spark-based version is more efficient than the Charm++ and MPI-based counterparts. To the best of our knowledge, ours is one of the preliminary efforts of using Apache Spark for epidemic simulations. We believe that our proposed model could act as a reference for similar large-scale epidemiological simulations exploring non-MPI or MapReduce-like approaches.
- Distributed Scheduling and Delay-Throughput Optimization in Wireless Networks under the Physical Interference ModelPei, Guanhong (Virginia Tech, 2013-01-21)We investigate diverse aspects of the performance of wireless networks, including throughput, delay and distributed complexity.
One of the main challenges for optimizing them arises from radio interference, an inherent factor in wireless networks.
Graph-based interference models represent a large class of interference models widely used for the study of wireless networks,
and suffer from the weakness of over-simplifying the interference caused by wireless signals in a local and binary way.
A more sophisticated interference model, the physical interference model, based on SINR constraints,
is considered more realistic but is more challenging to study (because of its non-linear form and non-local property).
In this dissertation, we study the connections between the two types of interference models -- graph-based and physical interference models --
and tackle a set of fundamental problems under the physical interference model;
previously, some of the problems were still open even under the graph-based interference model, and to those we have provided solutions under both types of interference models.
The underlying interference models affect scheduling and power control -- essential building blocks in the operation of wireless networks -- that directly deal with the wireless medium; the physical interference model (compared to graph-based interference model) compounds the problem of efficient scheduling and power control by making it non-local and non-linear.
The system performance optimization and tradeoffs with respect to throughput and delay require a ``global\'\' view across
transport, network, media access control (MAC), physical layers (referred to as cross-layer optimization)
to take advantage of the control planes in different levels of the wireless network protocol stack.
This can be achieved by regulating traffic rates, finding traffic flow paths for end-to-end sessions,
controlling the access to the wireless medium (or channels),
assigning the transmission power, and handling signal reception under interference.
The theme of the dissertation is
distributed algorithms and optimization of QoS objectives under the physical interference model.
We start by developing the first low-complexity distributed scheduling and power control algorithms for maximizing the efficiency ratio for different interference models;
we derive end-to-end per-flow delay upper-bounds for our scheduling algorithms and our delay upper-bounds are the first network-size-independent result known for multihop traffic.
Based on that, we design the first cross-layer multi-commodity optimization frameworks for delay-constrained throughput maximization by incorporating the routing and traffic control into the problem scope.
Scheduling and power control is also inherent to distributed computing of ``global problems\'\', e.g., the maximum independent set problems in terms of transmitting links and local broadcasts respectively, and the minimum spanning tree problems.
Under the physical interference model, we provide the first sub-linear time distributed solutions to the maximum independent set problems, and also solve the minimum spanning tree problems efficiently.
We develop new techniques and algorithms and exploit the availability of technologies (full-/half-duplex radios, fixed/software-defined power control) to further improve our algorithms.
%This fosters a deeper understanding of distributed scheduling from the network computing point of view.
We highlight our main technical contributions, which might be of independent interest to the design and analysis of optimization algorithms.
Our techniques involve the use of linear and mixed integer programs in delay-constrained throughput maximization. This demonstrates the combined use of different kinds of combinatorial optimization approaches for multi-criteria optimization.
We have developed techniques for queueing analysis under general stochastic traffic to analyze network throughput and delay properties.
We use randomized algorithms with rigorously analyzed performance guarantees to overcome the distributed nature of wireless data/control communications.
We factor in the availability of emerging radio technologies for performance improvements of our algorithms.
Some of our algorithmic techniques that would be of broader use in algorithms for the physical interference model include:
formal development of the distributed computing model in the SINR model, and reductions between models of different technological capabilities, the redefinition of interference sets in the setting of SINR constraints, and our techniques for distributed computation of rulings (informally, nodes or links which are well-separated covers). - Domain-based Frameworks and Embeddings for Dynamics over NetworksAdhikari, Bijaya (Virginia Tech, 2020-06-01)Broadly this thesis looks into network and time-series mining problems pertaining to dynamics over networks in various domains. Which locations and staff should we monitor in order to detect C. Difficile outbreaks in hospitals? How do we predict the peak intensity of the influenza incidence in an interpretable fashion? How do we infer the states of all nodes in a critical infrastructure network where failures have occurred? Leveraging domain-based information should make it is possible to answer these questions. However, several new challenges arise, such as (a) presence of more complex dynamics. The dynamics over networks that we consider are complex. For example, C. Difficile spreads via both people-to-people and surface-to-people interactions and correlations between failures in critical infrastructures go beyond the network structure and depend on the geography as well. Traditional approaches either rely on models like Susceptible Infectious (SI) and Independent Cascade (IC) which are too restrictive because they focus only on single pathways or do not incorporate the model at all, resulting in sub-optimality. (b) data sparsity. Additionally, the data sparsity still persists in this space. Specifically, it is difficult to collect the exact state of each node in the network as it is high-dimensional and difficult to directly sample from. (c) mismatch between data and process. In many situations, the underlying dynamical process is unknown or depends on a mixture of several models. In such cases, there is a mismatch between the data collected and the model representing the dynamics. For example, the weighted influenza like illness (wILI) count released by the CDC, which is meant to represent the raw fraction of total population infected by influenza, actually depends on multiple factors like the number of health-care providers reporting the number and public tendency to seek medical advice. In such cases, methods which generalize well to unobserved (or unknown) models are required. Current approaches often fail in tackling these challenges as they either rely on restrictive models, require large volume of data, and/or work only for predefined models. In this thesis, we propose to leverage domain-based frameworks, which include novel models and analysis techniques, and domain-based low dimensional representation learning to tackle the challenges mentioned above for networks and time-series mining tasks. By developing novel frameworks, we can capture the complex dynamics accurately and analyze them more efficiently. For example, to detect C. Difficile outbreaks in a hospital setting, we use a two-mode disease model to capture multiple pathways of outbreaks and discrete lattice-based optimization framework. Similarly, we propose an information theoretic framework which includes geographically correlated failures in critical infrastructure networks to infer the status of the network components. Moreover, as we use more realistic frameworks to accurately capture and analyze the mechanistic processes themselves, our approaches are effective even with sparse data. At the same time, learning low-dimensional domain-aware embeddings capture domain specific properties (like incidence-based similarity between historical influenza seasons) more efficiently from sparse data, which is useful for subsequent tasks. Similarly, since the domain-aware embeddings capture the model information directly from the data without any modeling assumptions, they generalize better to new models. Our domain-aware frameworks and embeddings enable many applications in critical domains. For example, our domain-aware frameworks for C. Difficile allows different monitoring rates for people and locations, thus detecting more than 95% of outbreaks. Similarly, our framework for product recommendation in e-commerce for queries with sparse engagement data resulted in a 34% improvement over the current Walmart.com search engine. Similarly, our novel framework leads to a near optimal algorithms, with additive approximation guarantee, for inferring network states given a partial observation of the failures in networks. Additionally, by exploiting domain-aware embeddings, we outperform non-trivial competitors by up to 40% for influenza forecasting. Similarly, domain-aware representations of subgraphs helped us outperform non-trivial baselines by up to 68% in the graph classification task. We believe our techniques will be useful for variety of other applications in many areas like social networks, urban computing, and so on.
- Dynamic Behavior Visualizer: A Dynamic Visual Analytics Framework for Understanding Complex Networked ModelsMaloo, Akshay (Virginia Tech, 2014-02-04)Dynamic Behavior Visualizer (DBV) is a visual analytics environment to visualize the spatial and temporal movements and behavioral changes of an individual or a group, e.g. family within a realistic urban environment. DBV is specifically designed to visualize the adaptive behavioral changes, as they pertain to the interactions with multiple inter-dependent infrastructures, in the aftermath of a large crisis, e.g. hurricane or the detonation of an improvised nuclear device. DBV is web-enabled and thus is easily accessible to any user with access to a web browser. A novel aspect of the system is its scale and fidelity. The goal of DBV is to synthesize information and derive insight from it; detect the expected and discover the unexpected; provide timely and easily understandable assessment and the ability to piece together all this information.
- A dynamic middleware to integrate multiple cloud infrastructures with remote apllicationsBhattacharjee, Tirtha Pratim (Virginia Tech, 2014-12-04)In an era with compelling need for greater computation power, the aggregation of software system components is becoming more challenging and diverse. The new-generation scientific applications are growing hub of complex and intense computation performed on huge data set with exponential growth. With the development of parallel algorithms, design of multi-user web applications and frequent changes in software architecture, there is a bigger challenge lying in front of the research institutes and organizations. Network science is an interesting field posing extreme computation demands to sustain complex large-scale networks. Several static or dynamic network analysis have to be performed through algorithms implementing complex graph theories, statistical mechanics, data mining and visualization. Similarly, high performance computation infrastructures are imbibing multiple characters and expanding in an unprecedented way. In this age, it's mandatory for all software solutions to migrate to scalable platforms and integrate cloud enabled data center clusters for higher computation needs. So, with aggressive adoption of cloud infrastructures and resource-intensive web-applications, there is a pressing need for a dynamic middleware to bridge the gap and effectively coordinate the integrated system. Such a heterogeneous environment encourages the devising of a transparent, portable and flexible solution stack. In this project, we propose adoption of Virtual Machine aware Portable Batch System Cluster (VM-aware PBS Cluster), a self-initiating and self-regulating cluster of Virtual Machines (VM) capable of operating and scaling on any cloud infrastructure. This is an unique but simple solution for large-scale softwares to migrate to cloud infrastructures retaining the most of the application stack intact. In this project, we have also designed and implemented Cloud Integrator Framework, a dynamic implementation of cloud aware middleware for the proposed VM-aware PBS cluster. This framework regulates job distribution in an aggregate of VMs and optimizes resource consumption through on-demand VM initialization and termination. The model was integrated into CINET system, a network science application. This model has enabled CINET to mediate large-scale network analysis and simulation tasks across varied cloud platforms such as OpenStack and Amazon EC2 for its computation requirements.
- Efficient Spatio-Temporal Network Analytics in Epidemiological Studies using Distributed DatabasesKhan, Mohammed Saquib Akmal (Virginia Tech, 2015-01-26)Real-time Spatio-Temporal Analytics has become an integral part of Epidemiological studies. The size of the spatio-temporal data has been increasing tremendously over the years, gradually evolving into Big Data. The processing in such domains are highly data and compute intensive. High performance computing resources resources are actively being used to handle such workloads over massive datasets. This confluence of High performance computing and datasets with Big Data characteristics poses great challenges pertaining to data handling and processing. The resource management of supercomputers is in conflict with the data-intensive nature of spatio-temporal analytics. This is further exacerbated due to the fact that the data management is decoupled from the computing resources. Problems of these nature has provided great opportunities in the growth and development of tools and concepts centered around MapReduce based solutions. However, we believe that advanced relational concepts can still be employed to provide an effective solution to handle these issues and challenges. In this study, we explore distributed databases to efficiently handle spatio-temporal Big Data for epidemiological studies. We propose DiceX (Data Intensive Computational Epidemiology using supercomputers), which couples high-performance, Big Data and relational computing by embedding distributed data storage and processing engines within the supercomputer. It is characterized by scalable strategies for data ingestion, unified framework to setup and configure various processing engines, along with the ability to pause, materialize and restore images of a data session. In addition, we have successfully configured DiceX to support approximation algorithms from MADlib Analytics Library [54], primarily Count-Min Sketch or CM Sketch [33][34][35]. DiceX enables a new style of Big Data processing, which is centered around the use of clustered databases and exploits supercomputing resources. It can effectively exploit the cores, memory and compute nodes of supercomputers to scale processing of spatio-temporal queries on datasets of large volume. Thus, it provides a scalable and efficient tool for data management and processing of spatio-temporal data. Although DiceX has been designed for computational epidemiology, it can be easily extended to different data-intensive domains facing similar issues and challenges. We thank our external collaborators and members of the Network Dynamics and Simulation Science Laboratory (NDSSL) for their suggestions and comments. This work has been partially supported by DTRA CNIMS Contract HDTRA1-11-D-0016-0001, DTRA Validation Grant HDTRA1-11-1-0016, NSF - Network Science and Engineering Grant CNS-1011769, NIH and NIGMS - Models of Infectious Disease Agent Study Grant 5U01GM070694-11. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government.