Browsing by Author "Vullikanti, Anil Kumar S."
Now showing 1 - 20 of 75
Results Per Page
Sort Options
- An activity-based energy demand modeling framework for buildings: A bottom-up approachSubbiah, Rajesh (Virginia Tech, 2013-05-23)Energy consumption by buildings, due to various factors such as temperature regulation, lighting, poses a threat to our environment and energy resources. In the United States, statistics reveal that commercial and residential buildings combined contribute about 40 percent of the overall energy consumption, and this figure is expected to increase. In order to manage the growing demand for energy, there is a need for energy system optimization, which would require a realistic, high-resolution energy-demand model. In this work, we investigate and model the energy consumption of buildings by taking into account physical, structural, economic, and social factors that influence energy use. We propose a novel activity based modeling framework that generates an energy demand profile on a regular basis for a given nominal day. We use this information to generate a building-level energy demand profile at highly dis-aggregated level. We then investigate the different possible uses of generated demand profiles in different What-if scenarios like urban-area planning, demand-side management, demand sensitive pricing, etc. We also provide a novel way to resolve correlational and consistency problems in the generation of individual-level and building-level "shared" activities which occur due to individuals\' interactions.
- Adaptive Radio Resource Management in Cognitive Radio Communications using Fuzzy ReasoningShatila, Hazem Sarwat (Virginia Tech, 2012-03-20)As wireless technologies evolve, novel innovations and concepts are required to dynamically and automatically alter various radio parameters in accordance with the radio environment. These innovations open the door for cognitive radio (CR), a new concept in telecommunications. CR makes its decisions using an inference engine, which can learn and adapt to changes in radio conditions. Fuzzy logic (FL) is the proposed decision-making algorithm for controlling the CR's inference engine. Fuzzy logic is well-suited for vague environments in which incomplete and heterogeneous information is present. In our proposed approach, FL is used to alter various radio parameters according to experience gained from different environmental conditions. FL requires a set of decision-making rules, which can vary according to radio conditions, but anomalies rise among these rules, causing degradation in the CR's performance. In such cases, the CR requires a method for eliminating such anomalies. In our model, we used a method based on the Dempster-Shafer (DS) theory of belief to accomplish this task. Through extensive simulation results and vast case studies, the use of the DS theory indeed improved the CR's decision-making capability. Using FL and the DS theory of belief is considered a vital module in the automation of various radio parameters for coping with the dynamic wireless environment. To demonstrate the FL inference engine, we propose a CR version of WiMAX, which we call CogMAX, to control different radio resources. Some of the physical parameters that can be altered for better results and performance are the physical layer parameters such as channel estimation technique, the number of subcarriers used for channel estimation, the modulation technique, and the code rate.
- An Algorithm for Influence Maximization and Target Set Selection for the Deterministic Linear Threshold ModelSwaminathan, Anand (Virginia Tech, 2014-07-03)The problem of influence maximization has been studied extensively with applications that include viral marketing, recommendations, and feed ranking. The optimization problem, first formulated by Kempe, Kleinberg and Tardos, is known to be NP-hard. Thus, several heuristics have been proposed to solve this problem. This thesis studies the problem of influence maximization under the deterministic linear threshold model and presents a novel heuristic for finding influential nodes in a graph with the goal of maximizing contagion spread that emanates from these influential nodes. Inputs to our algorithm include edge weights and vertex thresholds. The threshold difference greedy algorithm presented in this thesis takes into account both the edge weights as well as vertex thresholds in computing influence of a node. The threshold difference greedy algorithm is evaluated on 14 real-world networks. Results demonstrate that the new algorithm performs consistently better than the seven other heuristics that we evaluated in terms of final spread size. The threshold difference greedy algorithm has tuneable parameters which can make the algorithm run faster. As a part of the approach, the algorithm also computes the infected nodes in the graph. This eliminates the need for running simulations to determine the spread size from the influential nodes. We also study the target set selection problem with our algorithm. In this problem, the final spread size is specified and a seed (or influential) set is computed that will generate the required spread size.
- Algorithms for Reconstructing and Reasoning about Chemical Reaction NetworksCho, Yong Ju (Virginia Tech, 2013-01-24)Recent advances in systems biology have uncovered detailed mechanisms of biological processes such as the cell cycle, circadian rhythms, and signaling pathways. These mechanisms are modeled by chemical reaction networks (CRNs) which are typically simulated by converting to ordinary differential equations (ODEs), so that the goal is to closely reproduce the observed quantitative and qualitative behaviors of the modeled process. This thesis proposes two algorithmic problems related to the construction and comprehension of CRN models. The first problem focuses on reconstructing CRNs from given time series. Given multivariate time course data obtained by perturbing a given CRN, how can we systematically deduce the interconnections between the species of the network? We demonstrate how this problem can be modeled as, first, one of uncovering conditional independence relationships using buffering experiments and, second, of determining the properties of the individual chemical reactions. Experimental results demonstrate the effectiveness of our approach on both synthetic and real CRNs. The second problem this work focuses on is to aid in network comprehension, i.e., to understand the motifs underlying complex dynamical behaviors of CRNs. Specifically, we focus on bistability---an important dynamical property of a CRN---and propose algorithms to identify the core structures responsible for conferring bistability. The approach we take is to systematically infer the instability causing structures (ICSs) of a CRN and use machine learning techniques to relate properties of the CRN to the presence of such ICSs. This work has the potential to aid in not just network comprehension but also model simplification, by helping reduce the complexity of known bistable systems.
- Automatic Reconstruction of the Building Blocks of Molecular Interaction NetworksRivera, Corban G. (Virginia Tech, 2008-08-11)High-throughput whole-genome biological assays are highly intricate and difficult to interpret. The molecular interaction networks generated from evaluation of those experiments suggest that cellular functions are carried out by modules of interacting molecules. Reverse-engineering the modular structure of cellular interaction networks has the promise of significantly easing their analysis. We hypothesize that: • cellular wiring diagrams can be decomposed into overlapping modules, where each module is a set of coherently-interacting molecules and • a cell responds to a stress or a stimulus by appropriately modulating the activities of a subset of these modules. Motivated by these hypotheses, we develop models and algorithms that can reverse-engineer molecular modules from large-scale functional genomic data. We address two major problems: 1. Given a wiring diagram and genome-wide gene expression data measured after the application of a stress or in a disease state, compute the active network of molecular interactions perturbed by the stress or the disease. 2. Given the active networks for multiple stresses, stimuli, or diseases, compute a set of network legos, which are molecular modules with the property that each active network can be expressed as an appropriate combination of a subset of modules. To address the first problem, we propose an approach that computes the most-perturbed subgraph of a curated pathway of molecular interactions in a disease state. Our method is based on a novel score for pathway perturbation that incorporates both differential gene expression and the interaction structure of the pathway. We apply our method to a compendium of cancer types. We show that the significance of the most perturbed sub-pathway is frequently larger than that of the entire pathway. We identify an association that suggests that IL-2 infusion may have a similar therapeutic effect in bladder cancer as it does in melanoma. We propose two models to address the second problem. First, we formulate a Boolean model for constructing network legos from a set of active networks. We reduce the problem of computing network legos to that of constructing closed biclusters in a binary matrix. Applying this method to a compendium of 13 stresses on human cells, we automatically detect that about four to six hours after treatment with chemicals cause endoplasmic reticulum stress, fibroblasts shut down the cell cycle far more aggressively than fibroblasts or HeLa cells do in response to other treatments. Our second model represents each active network as an additive combination of network legos. We formulate the problem as one of computing network legos that can be used to recover active networks in an optimal manner. We use existing methods for non-negative matrix approximation to solve this problem. We apply our method to a human cancer dataset including 190 samples from 18 cancers. We identify a network lego that associates integrins and matrix metalloproteinases in ovarian adenoma and other cancers and a network lego including the retinoblastoma pathway associated with multiple leukemias.
- ‘Beating the news’ with EMBERS: Forecasting Civil Unrest using Open Source IndicatorsRamakrishnan, Naren; Butler, Patrick; Self, Nathan; Khandpur, Rupinder P.; Saraf, Parang; Wang, Wei; Cadena, Jose; Vullikanti, Anil Kumar S.; Korkmaz, Gizem; Kuhlman, Christopher J.; Marathe, Achla; Zhao, Liang; Ting, Hua; Huang, Bert; Srinivasan, Aravind; Trinh, Khoa; Getoor, Lise; Katz, Graham; Doyle, Andy; Ackermann, Chris; Zavorin, Ilya; Ford, Jim; Summers, Kristen; Fayed, Youssef; Arredondo, Jaime; Gupta, Dipak; Mares, David; Muthia, Sathappan; Chen, Feng; Lu, Chang-Tien (2014)We describe the design, implementation, and evaluation of EMBERS, an automated, 24x7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and continue to be) evaluated by an independent T&E team (MITRE). Of note, EMBERS has successfully forecast the uptick and downtick of incidents during the June 2013 protests in Brazil. We outline the system architecture of EMBERS, individual models that leverage specific data sources, and a fusion and suppression engine that supports trading off specific evaluation criteria. EMBERS also provides an audit trail interface that enables the investigation of why specific predictions were made along with the data utilized for forecasting. Through numerous evaluations, we demonstrate the superiority of EMBERS over baserate methods and its capability to forecast significant societal happenings.
- Behavior Modeling and Analytics for Urban Computing: A Synthetic Information-based ApproachParikh, Nidhi Kiranbhai (Virginia Tech, 2017-03-15)The rapid increase in urbanization poses challenges in diverse areas such as energy, transportation, pandemic planning, and disaster response. Planning for urbanization is a big challenge because cities are complex systems consisting of human populations, infrastructures, and interactions and interdependence among them. This dissertation focuses on a synthetic information-based approach for modeling human activities and behaviors for two urban science applications, epidemiology and disaster planning, and with associated analytics. Synthetic information is a data-driven approach to create a detailed, high fidelity representation of human populations, infrastructural systems and their behavioral and interaction aspects. It is used in developing large-scale simulations to model what-if scenarios and for policy making. Big cities have a large number of visitors visiting them every day. They often visit crowded areas in the city and come into contact with each other and the area residents. However, most epidemiological studies have ignored their role in spreading epidemics. We extend the synthetic population model of the Washington DC metro area to include transient populations, consisting of tourists and business travelers, along with their demographics and activities, by combining data from multiple sources. We evaluate the effect of including this population in epidemic forecasts, and the potential benefits of multiple interventions that target transients. In the next study, we model human behavior in the aftermath of the detonation of an improvised nuclear device in Washington DC. Previous studies of this scenario have mostly focused on modeling physical impact and simple behaviors like sheltering and evacuation. However, these models have focused on optimal behavior, not naturalistic behavior. In other words, prior work is focused on whether it is better to shelter-in-place or evacuate, but has not been informed by the literature on what people actually do in the aftermath of disasters. Natural human behaviors in disasters, such as looking for family members or seeking healthcare, are supported by infrastructures such as cell-phone communication and transportation systems. We model a range of behaviors such as looking for family members, evacuation, sheltering, healthcare-seeking, worry, and search and rescue and their interactions with infrastructural systems. Large-scale and complex agent-based simulations generate a large amount of data in each run of the simulation, making it hard to make sense of results. This leads us to formulate two new problems in simulation analytics. First, we develop algorithms to summarize simulation results by extracting causally-relevant state sequences - state sequences that have a measurable effect on the outcome of interest. Second, in order to develop effective interventions, it is important to understand which behaviors lead to positive and negative outcomes. It may happen that the same behavior may lead to different outcomes, depending upon the context. Hence, we develop an algorithm for contextual behavior ranking. In addition to the context mentioned in the query, our algorithm also identifies any additional context that may affect the behavioral ranking.
- The Betweenness Centrality Of Biological NetworksNarayanan, Shivaram (Virginia Tech, 2005-09-16)In the last few years, large-scale experiments have generated genome-wide protein interaction networks for many organisms including Saccharomyces cerevisiae (baker's yeast), Caenorhabditis elegans (worm) and Drosophila melanogaster (fruit fly). In this thesis, we examine the vertex and edge betweenness centrality measures of these graphs. These measures capture how "central" a vertex or an edge is in the graph by considering the fraction of shortest paths that pass through that vertex or edge. Our primary observation is that the distribution of the vertex betweenness centrality follows a power law, but the distribution of the edge betweenness centrality has a Poisson-like distribution with a very sharp spike. To investigate this phenomenon, we generated random networks with degree distribution identical to those of the protein interaction networks. To our surprise, we found out that the random networks and the protein interaction networks had almost identical distribution of edge betweenness. We conjecture that the "Poisson-like" distribution of the edge betweenness centrality is the property of any graph whose degree distribution satisfies power law.
- Bridging Methodological Gaps in Network-Based Systems BiologyPoirel, Christopher L. (Virginia Tech, 2013-10-16)Functioning of the living cell is controlled by a complex network of interactions among genes, proteins, and other molecules. A major goal of systems biology is to understand and explain the mechanisms by which these interactions govern the cell's response to various conditions. Molecular interaction networks have proven to be a powerful representation for studying cellular behavior. Numerous algorithms have been developed to unravel the complexity of these networks. Our work addresses the drawbacks of existing techniques. This thesis includes three related research efforts that introduce network-based approaches to bridge current methodological gaps in systems biology. i. Functional enrichment methods provide a summary of biological functions that are overrepresented in an interesting collection of genes (e.g., highly differentially expressed genes between a diseased cell and a healthy cell). Standard functional enrichment algorithms ignore the known interactions among proteins. We propose a novel network-based approach to functional enrichment that explicitly accounts for these underlying molecular interactions. Through this work, we close the gap between set-based functional enrichment and topological analysis of molecular interaction networks. ii. Many techniques have been developed to compute the response network of a cell. A recent trend in this area is to compute response networks of small size, with the rationale that only part of a pathway is often changed by disease and that interpreting small subnetworks is easier than interpreting larger ones. However, these methods may not uncover the spectrum of pathways perturbed in a particular experiment or disease. To avoid these difficulties, we propose to use algorithms that reconcile case-control DNA microarray data with a molecular interaction network by modifying per-gene differential expression p-values such that two genes connected by an interaction show similar changes in their gene expression values. iii. Top-down analyses in systems biology can automatically find correlations among genes and proteins in large-scale datasets. However, it is often difficult to design experiments from these results. In contrast, bottom-up approaches painstakingly craft detailed models of cellular processes. However, developing the models is a manual process that can take many years. These approaches have largely been developed independently. We present Linker, an efficient and automated data-driven method that analyzes molecular interactomes. Linker combines teleporting random walks and k-shortest path computations to discover connections from a set of source proteins to a set of target proteins. We demonstrate the efficacy of Linker through two applications: proposing extensions to an existing model of cell cycle regulation in budding yeast and automated reconstruction of human signaling pathways. Linker achieves superior precision and recall compared to state-of-the-art algorithms from the literature.
- Capacity Characterization of Multi-Hop Wireless Networks- A Cross Layer ApproachChafekar, Deepti Ramesh (Virginia Tech, 2009-03-11)A fundamental problem in multi-hop wireless networks is to estimate their throughout capacity. The problem can be informally stated as follows: given a multi-hop wireless network and a set of source destination pairs, determine the maximum rate r at which data can be transmitted between each source destination pair. Estimating the capacity of a multi-hop wireless network is practically useful --- it yields insights into the fundamental performance limits of the wireless network and at the same time aids the development of protocols that can utilize the network close to this limit. A goal of this dissertation is to develop rigorous mathematical foundations to compute the capacity of any given multi-hop wireless network with known source-destination pairs. An important factor that affects the capacity of multi-hop wireless networks is radio interference. As a result, researchers have proposed increasingly realistic interference models that aim to capture the physical characteristics of radio signals. Some of the commonly used simple models that capture radio interference are based on geometric disk-graphs. The simplicity of these models facilitate the development of provable and often conceptually simple methods for estimating the capacity of wireless networks. A potential weakness of this class of models is that they oversimplify the physical process by assuming that the signal ends abruptly at the boundary of a geometric region (a disk for omni-directional antennas). A more sophisticated interference model is the physical interference model, also known as the Signal to Interference Plus Noise Ratio (SINR) model. This model is more realistic than disk-graph models as it captures the effects of signal fading and ambient noise. This work considers both disk-graph and SINR interference models. In addition to radio interference, the throughput capacity of a multi-hop wireless network also depends on other factors, including the specific paths selected to route the packets between the source destination pairs (routing), the time at which packets are transmitted (scheduling), the power with which nodes transmit (power control) and the rate at which packets are injected (rate control). In this dissertation, we consider three different problems related to estimating network capacity. We propose an algorithmic approach for solving these problems. We first consider the problem of maximizing throughput with the SINR interference model by jointly considering the effects of routing and scheduling constraints. Second, we consider the problem of maximizing throughput by performing adaptive power control, scheduling and routing for disk-graph interference models. Finally, we examine the problem of minimizing end-to-end latency by performing joint routing, scheduling and power control using the SINR interference model. Recent results have shown that traditional layered networking principles lead to inefficient utilization of resources in multi-hop wireless networks. Motivated by these observations, recent papers have begun investigating cross-layer design approaches. Although our work does not develop new cross-layered protocols, it yields new insights that could contribute to the development of such protocols in the future. Our approach for solving these multi-objective optimization problems is based on combining mathematical programming with randomized rounding to obtain polynomial time approximation algorithms with provable worst case performance ratios. For the problems considered in this work, our results provide the best analytical performance guarantees currently known in the literature. We complement our rigorous theoretical and algorithmic analysis with simulation-based experimental analysis. Our experimental results help us understand the limitations of our approach and assist in identifying certain parameters for improving the performance of our techniques.
- Combining Participatory Influenza Surveillance with Modeling and Forecasting: Three Alternative ApproachesBrownstein, John S.; Marathe, Achla (JMIR Publications, 2017)Background: Influenza outbreaks affect millions of people every year and its surveillance is usually carried out in developed countries through a network of sentinel doctors who report the weekly number of Influenza-like Illness cases observed among the visited patients. Monitoring and forecasting the evolution of these outbreaks supports decision makers in designing effective interventions and allocating resources to mitigate their impact. Objective: Describe the existing participatory surveillance approaches that have been used for modeling and forecasting of the seasonal influenza epidemic, and how they can help strengthen real-time epidemic science and provide a more rigorous understanding of epidemic conditions. Methods: We describe three different participatory surveillance systems, WISDM (Widely Internet Sourced Distributed Monitoring), Influenzanet and Flu Near You (FNY), and show how modeling and simulation can be or has been combined with participatory disease surveillance to: i) measure the non-response bias in a participatory surveillance sample using WISDM; and ii) nowcast and forecast influenza activity in different parts of the world (using Influenzanet and Flu Near You). Results: WISDM-based results measure the participatory and sample bias for three epidemic metrics i.e. attack rate, peak infection rate, and time-to-peak, and find the participatory bias to be the largest component of the total bias. The Influenzanet platform shows that digital participatory surveillance data combined with a realistic data-driven epidemiological model can provide both short-term and long-term forecasts of epidemic intensities, and the ground truth data lie within the 95 percent confidence intervals for most weeks. The statistical accuracy of the ensemble forecasts increase as the season progresses. The Flu Near You platform shows that participatory surveillance data provide accurate short-term flu activity forecasts and influenza activity predictions. The correlation of the HealthMap Flu Trends estimates with the observed CDC ILI rates is 0.99 for 2013-2015. Additional data sources lead to an error reduction of about 40% when compared to the estimates of the model that only incorporates CDC historical information. Conclusions: While the advantages of participatory surveillance, compared to traditional surveillance, include its timeliness, lower costs, and broader reach, it is limited by a lack of control over the characteristics of the population sample. Modeling and simulation can help overcome this limitation as well as provide real-time and long-term forecasting of influenza activity in data-poor parts of the world.
- Computational Approaches to Predict Effect of Epigenetic Modifications on Transcriptional Regulation of Gene ExpressionBanerjee, Sharmi (Virginia Tech, 2019-10-07)This dissertation presents applications of machine learning and statistical approaches to infer protein-DNA bindings in the presence of epigenetic modifications. Epigenetic modifications are alterations to the DNA resulting in gene expression regulation where the structure of the DNA remains unaltered. It is a heritable and reversible modification and often involves addition or deletion of certain chemical compounds to the DNA. Histone modification is an epigenetic change that involves alteration of the histone proteins – thus changing the chromatin (DNA wound around histone proteins) structure – or addition of methyl-groups to the Cytosine base adjacent to a Guanine base. Epigenetic factors often interfere in gene expression regulation by promoting or inhibiting protein-DNA bindings. Such proteins are known as transcription factors. Transcription is the first step of gene expression where a particular segment of DNA is copied into the messenger-RNA (mRNA). Transcription factors orchestrate gene activity and are crucial for normal cell function in any organism. For example, deletion/mutation of certain transcription factors such as MEF2 have been associated with neurological disorders such as autism and schizophrenia. In this dissertation, different computational pipelines are described that use mathematical models to explain how the protein-DNA bindings are mediated by histone modifications and DNA-methylation affecting different regions of the brain at different stages of development. Multi-layer Markov models, Inhomogeneous Poisson analyses are used on data from brain to show the impact of epigenetic factors on protein-DNA bindings. Such data driven approaches reinforce the importance of epigenetic factors in governing brain cell differentiation into different neuron types, regulation of memory and promotion of normal brain development at the early stages of life.
- Computational Cost Analysis of Large-Scale Agent-Based Epidemic SimulationsKamal, Tariq (Virginia Tech, 2016-09-21)Agent-based epidemic simulation (ABES) is a powerful and realistic approach for studying the impacts of disease dynamics and complex interventions on the spread of an infection in the population. Among many ABES systems, EpiSimdemics comes closest to the popular agent-based epidemic simulation systems developed by Eubank, Longini, Ferguson, and Parker. EpiSimdemics is a general framework that can model many reaction-diffusion processes besides the Susceptible-Exposed-Infectious-Recovered (SEIR) models. This model allows the study of complex systems as they interact, thus enabling researchers to model and observe the socio-technical trends and forces. Pandemic planning at the world level requires simulation of over 6 billion agents, where each agent has a unique set of demographics, daily activities, and behaviors. Moreover, the stochastic nature of epidemic models, the uncertainty in the initial conditions, and the variability of reactions require the computation of several replicates of a simulation for a meaningful study. Given the hard timelines to respond, running many replicates (15-25) of several configurations (10-100) (of these compute-heavy simulations) can only be possible on high-performance clusters (HPC). These agent-based epidemic simulations are irregular and show poor execution performance on high-performance clusters due to the evolutionary nature of their workload, large irregular communication and load imbalance. For increased utilization of HPC clusters, the simulation needs to be scalable. Many challenges arise when improving the performance of agent-based epidemic simulations on high-performance clusters. Firstly, large-scale graph-structured computation is central to the processing of these simulations, where the star-motif quality nodes (natural graphs) create large computational imbalances and communication hotspots. Secondly, the computation is performed by classes of tasks that are separated by global synchronization. The non-overlapping computations cause idle times, which introduce the load balancing and cost estimation challenges. Thirdly, the computation is overlapped with communication, which is difficult to measure using simple methods, thus making the cost estimation very challenging. Finally, the simulations are iterative and the workload (computation and communication) may change through iterations, as a result introducing load imbalances. This dissertation focuses on developing a cost estimation model and load balancing schemes to increase the runtime efficiency of agent-based epidemic simulations on high-performance clusters. While developing the cost model and load balancing schemes, we perform the static and dynamic load analysis of such simulations. We also statically quantified the computational and communication workloads in EpiSimdemics. We designed, developed and evaluated a cost model for estimating the execution cost of large-scale parallel agent-based epidemic simulations (and more generally for all constrained producer-consumer parallel algorithms). This cost model uses computational imbalances and communication latencies, and enables the cost estimation of those applications where the computation is performed by classes of tasks, separated by synchronization. It enables the performance analysis of parallel applications by computing its execution times on a number of partitions. Our evaluations show that the model is helpful in performance prediction, resource allocation and evaluation of load balancing schemes. As part of load balancing algorithms, we adopted the Metis library for partitioning bipartite graphs. We have also developed lower-overhead custom schemes called Colocation and MetColoc. We performed an evaluation of Metis, Colocation, and MetColoc. Our analysis showed that the MetColoc schemes gives a performance similar to Metis, but with half the partitioning overhead (runtime and memory). On the other hand, the Colocation scheme achieves a similar performance to Metis on a larger number of partitions, but at extremely lower partitioning overhead. Moreover, the memory requirements of Colocation scheme does not increase as we create more partitions. We have also performed the dynamic load analysis of agent-based epidemic simulations. For this, we studied the individual and joint effects of three disease parameter (transmissiblity, infection period and incubation period). We quantified the effects using an analytical equation with separate constants for SIS, SIR and SI disease models. The metric that we have developed in this work is useful for cost estimation of constrained producer-consumer algorithms, however, it has some limitations. The applicability of the metric is application, machine and data-specific. In the future, we plan to extend the metric to increase its applicability to a larger set of machine architectures, applications, and datasets.
- Computational Framework for Uncertainty Quantification, Sensitivity Analysis and Experimental Design of Network-based Computer Simulation ModelsWu, Sichao (Virginia Tech, 2017-08-29)When capturing a real-world, networked system using a simulation model, features are usually omitted or represented by probability distributions. Verification and validation (V and V) of such models is an inherent and fundamental challenge. Central to V and V, but also to model analysis and prediction, are uncertainty quantification (UQ), sensitivity analysis (SA) and design of experiments (DOE). In addition, network-based computer simulation models, as compared with models based on ordinary and partial differential equations (ODE and PDE), typically involve a significantly larger volume of more complex data. Efficient use of such models is challenging since it requires a broad set of skills ranging from domain expertise to in-depth knowledge including modeling, programming, algorithmics, high- performance computing, statistical analysis, and optimization. On top of this, the need to support reproducible experiments necessitates complete data tracking and management. Finally, the lack of standardization of simulation model configuration formats presents an extra challenge when developing technology intended to work across models. While there are tools and frameworks that address parts of the challenges above, to the best of our knowledge, none of them accomplishes all this in a model-independent and scientifically reproducible manner. In this dissertation, we present a computational framework called GENEUS that addresses these challenges. Specifically, it incorporates (i) a standardized model configuration format, (ii) a data flow management system with digital library functions helping to ensure scientific reproducibility, and (iii) a model-independent, expandable plugin-type library for efficiently conducting UQ/SA/DOE for network-based simulation models. This framework has been applied to systems ranging from fundamental graph dynamical systems (GDSs) to large-scale socio-technical simulation models with a broad range of analyses such as UQ and parameter studies for various scenarios. Graph dynamical systems provide a theoretical framework for network-based simulation models and have been studied theoretically in this dissertation. This includes a broad range of stability and sensitivity analyses offering insights into how GDSs respond to perturbations of their key components. This stability-focused, structure-to-function theory was a motivator for the design and implementation of GENEUS. GENEUS, rooted in the framework of GDS, provides modelers, experimentalists, and research groups access to a variety of UQ/SA/DOE methods with robust and tested implementations without requiring them to necessarily have the detailed expertise in statistics, data management and computing. Even for research teams having all the skills, GENEUS can significantly increase research productivity.
- Containing Cascading Failures in Networks: Applications to Epidemics and CybersecuritySaha, Sudip (Virginia Tech, 2016-10-05)Many real word networks exhibit cascading phenomena, e.g., disease outbreaks in social contact networks, malware propagation in computer networks, failures in cyber-physical systems such as power grids. As they grow in size and complexity, their security becomes increasingly important. In this thesis, we address the problems of controlling cascading failures in various network settings. We address the cascading phenomena which are either natural (e.g., disease outbreaks) or malicious (e.g., cyber attacks). We consider the nodes of a network as being individually or collectively controlled by self-interested autonomous agents and study their strategic decisions in the presence of these failure cascades. There are many models of cascading failures which specify how a node would fail when some neighbors have failed, such as: (i) epidemic spread models in which the cascading can be viewed as a natural and stochastic process and (ii) cyber attack models where the cascade is driven by malicious intents. We present our analyses and algorithms for these models in two parts. Part I focuses on problems of controlling epidemic spread. Epidemic outbreaks are generally modeled as stochastic diffusion processes. In particular, we consider the SIS model on networks. There exist heuristic centralized approaches in the literature for containing epidemic spread in SIS/SIR models; however no rigorous performance bounds are known for these approaches. We develop algorithms with provable approximation guarantees that involve either protective intervention (e.g., vaccination) or link removal (e.g., unfriending). Our approach relies on the characterization of the SIS model in terms of the spectral radius of the network. The centralized approaches, however, are sometimes not feasible in practice. For example, targeted vaccination is often not feasible because of limited compliance to directives. This issue has been addressed in the literature by formulating game theoretic models for the containment of epidemic spread. However they generally assume simplistic propagation models or homogeneous network structures. We develop novel game formulations which rely on the spectral characterization of the SIS model. In these formulations, the failures start from a random set of nodes and propagate through the network links. Each node acts as a self-interested agent and makes strategic intervention decisions (e.g., taking vaccination). Each agent decides its strategy to optimize its payoff (modeled by some payoff function). We analyze the complexity of finding Nash equilibria (NE) and study the structure of NE for different networks in these game settings. Part II focuses on malware spread in networks. In cybersecurity literature malware spreads are often studied in the framework of ``attack graph" models. In these models, a node represents either a physical computing unit or a network configuration and an edge represents a physical or logical vulnerability dependency. A node gets compromised if a certain set of its neighbors are compromised. Attack graphs describe explicit scenarios in which a single vulnerability exploitation cascades further into the network exploiting inherent dependencies among the network components. Attack graphs are used for studying cascading effects in many cybersecurity applications, e.g., component failure in enterprise networks, botnet spreads, advanced persistent attacks. One distinct feature of cyber attack cascades is the stealthy nature of the attack moves. Also, cyber attacks are generally repeated. How to control stealthy and repeated attack cascades is an interesting problem. Dijk et. al.~cite{van2013flipit} first proposed a game framework called ``FlipIt" for reasoning about the stealthy interaction between a defender and an attacker over the control of a system resource. However, in cybersecurity applications, systems generally consists of multiple resources connected by a network. Therefore it is imperative to study the stealthy attack and defense in networked systems. We develop a generalized framework called ``FlipNet" which extends the work of Dijk et. al.~cite{van2013flipit} for network. We present analyses and algorithms for different problems in this framework. On the other hand, if the security of a system is limited to the vulnerabilities and exploitations that are known to the security community, often the objective of the system owner is to take cost-effective steps to minimize potential damage in the network. This problem has been formulated in the cybersecurity literature as hardening attack graphs. Several heuristic approaches have been shown in the litrature so far but no algorithmic analysis have been shown. We analyze the inherent vulnerability of the network and present approximation hardening algorithms.
- Cooperation in Wireless NetworksSharma, Sushant (Virginia Tech, 2010-12-16)Spatial diversity, in the form of employing multiple antennas (i.e., MIMO), has proved to be very effective in increasing network capacity and reliability. However, equipping a wireless node with multiple antennas may not be practical, as the footprint of multiple antennas may not fit on a wireless node (particularly on handheld wireless devices). In order to achieve spatial diversity without requiring multiple antennas on the same node, the so-called cooperative communications (CC) has been introduced. Under CC, each node is equipped with only a single antenna and spatial diversity is achieved by exploiting the antennas on other nodes in the network through cooperative relaying. The goal of this dissertation is to maximize throughput at network level through CC at the physical layer. A number of problems are explored in this investigation. The main contributions of this dissertation can be summarized as follows. 1. Optimal Relay Assignment. We first consider a simple CC model where each source-destination pair may employ only a single relay. For this three-node model, the choice of a relay node (among a set of available relay nodes) for a given session is critical in the overall network performance. We study the relay node assignment problem in a cooperative ad hoc network environment, where multiple source-destination pairs compete for the same pool of relay nodes in the network. Our objective is to assign the available relay nodes to different source-destination pairs so as to maximize the minimum data rate among all pairs. We present an optimal polynomial time algorithm, called ORA, that solves this problem. A novel idea in this algorithm is a "linear marking" mechanism, which maintains linear complexity at each iteration. We offer a formal proof of optimality for ORA and use numerical results to demonstrate its capability. 2. Incorporating Network Coding. It has been shown that network coding (NC) can reduce the time-slot overhead when multiple session share the same relay node in CC. Such an approach is called network-coded CC (or NC-CC). Most of the existing works have mainly focused on the benefits of this approach. The potential adverse effect under NC-CC remains unknown. We explore this important problem by introducing the concept of network coding noise (NC noise). We show that due to NC noise, NC may not be always beneficial to CC. We substantiate this important finding in two important scenarios: analog network coding (ANC) in amplify-and-forward (AF) CC, and digital network coding (DNC) in decode-and-forward (DF) CC. We analyze the origin of NC noise via a careful study of signal aggregation at a relay node and signal extraction at a destination node. We derive a closed-form expression for NC noise at each destination node and show that the existence of NC noise could diminish the advantage of NC in CC. Our results shed new light on how to use NC in CC effectively. 3. Session Grouping and Relay Node Selection. When there are multiple sessions in the network, it may be necessary to combine sessions into different groups, and then have each group select the most beneficial relay node for NC-CC. We study this joint grouping and relay node selection problem for NC-CC. By studying matching problems in hypergraphs, we show that this problem is NP-hard. We then propose a distributed and online algorithm to solve this problem. The key idea in our algorithm is to have each neighboring relay node of a newly joined session determine and offer the best group for this session from the groups that it is currently serving; and then to have the source node of this newly joined session select the best group among all received offers. We show that our distributed algorithm has polynomial complexity. Using extensive numerical results, we show that our distributed algorithm is near-optimal and adapts well to online network dynamics. 4. Grouping and Matching for Multi-Relay Cooperation. Existing models of NC-CC consider only single relay node for each session group. We investigate how NC-CC behaves when multiple relay nodes are employed. For a given session, we develop closed form formulas for the mutual information and achievable rate under multi-relay NC-CC. In multi-relay NC-CC, the achievable rate of a session depends on the other sessions in its group as well as the set of relay nodes used for NC-CC. Therefore, we study NC-CC via joint optimization of grouping and matching of session and relay groups in an ad hoc network. Although we show that the joint problem is NP-hard, we develop an efficient polynomial time algorithm for grouping and matching (called G²M). G²M first builds beneficial relay groups for individual sessions. This is followed by multiple iterations during which sessions are combined with other sessions to form larger and better session groups (while corresponding relay groups are merged and updated accordingly). Using extensive numerical results, we show the efficiency and near optimality of our G²M algorithm.
- Coverage, Secrecy, and Stability Analysis of Energy Harvesting Wireless NetworksKishk, Mustafa (Virginia Tech, 2018-08-03)Including energy harvesting capability in a wireless network is attractive for multiple reasons. First and foremost, powering base stations with renewable resources could significantly reduce their reliance on the traditional energy sources, thus helping in curtailing the carbon footprint. Second, including this capability in wireless devices may help in increasing their lifetime, which is especially critical for devices for which it may not be easy to charge or replace batteries. This will often be the case for a large fraction of sensors that will form the {em digital skin} of an Internet of Things (IoT) ecosystem. Motivated by these factors, this work studies fundamental performance limitations that appear due to the inherent unreliability of energy harvesting when it is used as a primary or secondary source of energy by different elements of the wireless network, such as mobile users, IoT sensors, and/or base stations. The first step taken towards this objective is studying the joint uplink and downlink coverage of radio-frequency (RF) powered cellular-based IoT. Modeling the locations of the IoT devices and the base stations (BSs) using two independent Poisson point processes (PPPs), the joint uplink/downlink coverage probability is derived. The resulting expressions characterize how different system parameters impact coverage performance. Both mathematical expressions and simulation results show how these system parameters should be tuned in order to achieve the performance of the regularly powered IoT (IoT devices are powered by regular batteries). The placement of RF-powered devices close to the RF sources, to harvest more energy, imposes some concerns on the security of the signals transmitted by these RF sources to their intended receivers. Studying this problem is the second step taken in this dissertation towards better understanding of energy harvesting wireless networks. While these secrecy concerns have been recently addressed for the point-to-point link, it received less attention for the more general networks with randomly located transmitters (RF sources) and RF-powered devices, which is the main contribution in the second part of this dissertation. In the last part of this dissertation, we study the stability of solar-powered cellular networks. We use tools from percolation theory to study percolation probability of energy-drained BSs. We study the effect of two system parameters on that metric, namely, the energy arrival rate and the user density. Our results show the existence of a critical value for the ratio of the energy arrival rate to the user density, above which the percolation probability is zero. The next step to further improve the accuracy of the stability analysis is to study the effect of correlation between the battery levels at neighboring BSs. We provide an initial study that captures this correlation. The main insight drawn from our analysis is the existence of an optimal overlapping coverage area for neighboring BSs to serve each other's users when they are energy-drained.
- A Database Supported Modeling Environment for Pandemic Planning and Course of Action AnalysisMa, Yifei (Virginia Tech, 2013-06-24)Pandemics can significantly impact public health and society, for instance, the 2009 H1N1
and the 2003 SARS. In addition to analyzing the historic epidemic data, computational simulation of epidemic propagation processes and disease control strategies can help us understand the spatio-temporal dynamics of epidemics in the laboratory. Consequently, the public can be better prepared and the government can control future epidemic outbreaks more effectively. Recently, epidemic propagation simulation systems, which use high performance computing technology, have been proposed and developed to understand disease propagation processes. However, run-time infection situation assessment and intervention adjustment, two important steps in modeling disease propagation, are not well supported in these simulation systems. In addition, these simulation systems are computationally efficient in their simulations, but most of them have
limited capabilities in terms of modeling interventions in realistic scenarios.
In this dissertation, we focus on building a modeling and simulation environment for epidemic propagation and propagation control strategy. The objective of this work is to
design such a modeling environment that both supports the previously missing functions,
meanwhile, performs well in terms of the expected features such as modeling fidelity,
computational efficiency, modeling capability, etc. Our proposed methodologies to build
such a modeling environment are: 1) decoupled and co-evolving models for disease propagation, situation assessment, and propagation control strategy, and 2) assessing situations and simulating control strategies using relational databases. Our motivation for exploring these methodologies is as follows: 1) a decoupled and co-evolving model allows us to design modules for each function separately and makes this complex modeling system design simpler, and 2) simulating propagation control strategies using relational databases improves the modeling capability and human productivity of using this modeling environment. To evaluate our proposed methodologies, we have designed and built a loosely coupled and database supported epidemic modeling and simulation environment. With detailed experimental results and realistic case studies, we demonstrate that our modeling environment provides the missing functions and greatly enhances many expected features, such as modeling capability, without significantly sacrificing computational efficiency and scalability. - Deep Learning for Taxonomy PredictionRamesh, Shreyas (Virginia Tech, 2019-06-04)The last decade has seen great advances in Next-Generation Sequencing technologies, and, as a result, there has been a rise in the number of genomes sequenced each year. In 2017, there were as many as 10,000 new organisms sequenced and added into the RefSeq Database. Taxonomy prediction is a science involving the hierarchical classification of DNA fragments up to the rank species. In this research, we introduce Predicting Linked Organisms, Plinko, for short. Plinko is a fully-functioning, state-of-the-art predictive system that accurately captures DNA - Taxonomy relationships where other state-of-the-art algorithms falter. Plinko leverages multi-view convolutional neural networks and the pre-defined taxonomy tree structure to improve multi-level taxonomy prediction. In the Plinko strategy, each network takes advantage of different word usage patterns corresponding to different levels of evolutionary divergence. Plinko has the advantages of relatively low storage, GPGPU parallel training and inference, making the solution portable, and scalable with anticipated genome database growth. To the best of our knowledge, Plinko is the first to use multi-view convolutional neural networks as the core algorithm in a compositional,alignment-free approach to taxonomy prediction.
- Design and Implementation of An Emulation Testbed for Optimal Spectrum Sharing in Multi-hop Cognitive Radio NetworksLiu, Tong (Virginia Tech, 2007-07-09)Cognitive Radio (CR) capitalizes advances in signal processing and radio technology and is capable of reconfiguring RF and switching to desired frequency bands. It is a frequency-agile data communication device that is vastly more powerful than existing multi-channel multi-radio (MC-MR) technology. In this thesis, we investigate the important problem of multi-hop networking with CR nodes. In a CR network, each node has a set of frequency bands (not necessarily of equal size) that may not be the same as those at other nodes. The uneven size of frequency bands prompts the need of further division into sub-bands for optimal spectrum sharing. We characterize behaviors and constraints for such multi-hop CR network from multiple layers, including modeling of spectrum sharing and sub-band division, scheduling and interference constraints, and flow routing. We give a formal mathematical formulation with the objective of maximizing the network throughput for a set of user communication sessions. Since such problem formulation falls into mixed integer non-linear programming (MINLP), which is NP-hard in general, we develop a lower bound for the objective by relaxing the integer variables and linearization. Subsequently, we develop a nearoptimal algorithm to this MINLP problem. This algorithm is based on a novel sequential fixing (SF) procedure, where the integer variables are determined iteratively via a sequence of linear program (LP). In order to implement and evaluate these algorithms in a controlled laboratory setting, we design and implement an emulation testbed. The highlights of our experimental research include: • Emulation of a multi-hop CR network with arbitrary topology; • An implementation of the proposed SF algorithm at the application layer; • A source routing implementation that can easily support comparative study between SF algorithm and other schemes; • Experiments comparing the SF algorithm with another algorithm called Layered Greedy Algorithm (LGA); • Experimental results show that the proposed SF significantly outperforms LGA. In summary, the experimental research in this thesis has demonstrated that SF algorithm is a viable algorithm for optimal spectrum sharing in multi-hop CR networks.