Browsing by Author "Ramakrishnan, Naren"
Now showing 1 - 20 of 232
Results Per Page
Sort Options
- 1918 Spanish FluEwing, E. Thomas; Hausman, Bernice L.; Ramakrishnan, Naren (2013-10-02)
- Addressing Challenges of Modern News Agencies via Predictive Modeling, Deep Learning, and Transfer LearningKeneshloo, Yaser (Virginia Tech, 2019-07-22)Today's news agencies are moving from traditional journalism, where publishing just a few news articles per day was sufficient, to modern content generation mechanisms, which create more than thousands of news pieces every day. With the growth of these modern news agencies comes the arduous task of properly handling this massive amount of data that is generated for each news article. Therefore, news agencies are constantly seeking solutions to facilitate and automate some of the tasks that have been previously done by humans. In this dissertation, we focus on some of these problems and provide solutions for two broad problems which help a news agency to not only have a wider view of the behaviour of readers around the article but also to provide an automated tools to ease the job of editors in summarizing news articles. These two disjoint problems are aiming at improving the users' reading experience by helping the content generator to monitor and focus on poorly performing content while allow them to promote the good-performing ones. We first focus on the task of popularity prediction of news articles via a combination of regression, classification, and clustering models. We next focus on the problem of generating automated text summaries for a long news article using deep learning models. The first problem aims at helping the content developer in understanding of how a news article is performing over the long run while the second problem provides automated tools for the content developers to generate summaries for each news article.
- Advances in aircraft design: multiobjective optimization and a markup languageDeshpande, Shubhangi Govind (Virginia Tech, 2014-01-23)Today's modern aerospace systems exhibit strong interdisciplinary coupling and require a multidisciplinary, collaborative approach. Analysis methods that were once considered feasible only for advanced and detailed design are now available and even practical at the conceptual design stage. This changing philosophy for conducting conceptual design poses additional challenges beyond those encountered in a low fidelity design of aircraft. This thesis takes some steps towards bridging the gaps in existing technologies and advancing the state-of-the-art in aircraft design. The first part of the thesis proposes a new Pareto front approximation method for multiobjective optimization problems. The method employs a hybrid optimization approach using two derivative free direct search techniques, and is intended for solving blackbox simulation based multiobjective optimization problems with possibly nonsmooth functions where the analytical form of the objectives is not known and/or the evaluation of the objective function(s) is very expensive (very common in multidisciplinary design optimization). A new adaptive weighting scheme is proposed to convert a multiobjective optimization problem to a single objective optimization problem. Results show that the method achieves an arbitrarily close approximation to the Pareto front with a good collection of well-distributed nondominated points. The second part deals with the interdisciplinary data communication issues involved in a collaborative mutidisciplinary aircraft design environment. Efficient transfer, sharing, and manipulation of design and analysis data in a collaborative environment demands a formal structured representation of data. XML, a W3C recommendation, is one such standard concomitant with a number of powerful capabilities that alleviate interoperability issues. A compact, generic, and comprehensive XML schema for an aircraft design markup language (ADML) is proposed here to provide a common language for data communication, and to improve efficiency and productivity within a multidisciplinary, collaborative environment. An important feature of the proposed schema is the very expressive and efficient low level schemata. As a proof of concept the schema is used to encode an entire Convair B58. As the complexity of models and number of disciplines increases, the reduction in effort to exchange data models and analysis results in ADML also increases.
- Affordances and Feedback in Nuance-Oriented InterfacesWingrave, Chadwick A.; Bowman, Douglas A.; Ramakrishnan, Naren (Department of Computer Science, Virginia Polytechnic Institute & State University, 2001)Virtual Environments (VEs) and perceptive user interfaces must deal with complex users and their modes of interaction. One way to approach this problem is to recognize users’ nuances (subtle conscious or unconscious actions). In exploring nuance-oriented interfaces, we attempted to let users work as they preferred without being biased by feedback or affordances in the system. The hope was that we would discover the users’ innate models of interaction. The results of two user studies were that users are guided not by any innate model but by affordances and feedback in the interface. So, without this guidance, even the most obvious and useful components of an interface will be ignored.
- Algorithmic Distribution of Applied Learning on Big DataShukla, Manu (Virginia Tech, 2020-10-16)Machine Learning and Graph techniques are complex and challenging to distribute. Generally, they are distributed by modeling the problem in a similar way as single node sequential techniques except applied on smaller chunks of data and compute and the results combined. These techniques focus on stitching the results from smaller chunks as the best possible way to have the outcome as close to the sequential results on entire data as possible. This approach is not feasible in numerous kernel, matrix, optimization, graph, and other techniques where the algorithm needs access to all the data during execution. In this work, we propose key-value pair based distribution techniques that are widely applicable to statistical machine learning techniques along with matrix, graph, and time series based algorithms. The crucial difference with previously proposed techniques is that all operations are modeled on key-value pair based fine or coarse-grained steps. This allows flexibility in distribution with no compounding error in each step. The distribution is applicable not only in robust disk-based frameworks but also in in-memory based systems without significant changes. Key-value pair based techniques also provide the ability to generate the same result as sequential techniques with no edge or overlap effects in structures such as graphs or matrices to resolve. This thesis focuses on key-value pair based distribution of applied machine learning techniques on a variety of problems. For the first method key-value pair distribution is used for storytelling at scale. Storytelling connects entities (people, organizations) using their observed relationships to establish meaningful storylines. When performed sequentially these computations become a bottleneck because the massive number of entities make space and time complexity untenable. We present DISCRN, or DIstributed Spatio-temporal ConceptseaRch based StorytelliNg, a distributed framework for performing spatio-temporal storytelling. The framework extracts entities from microblogs and event data, and links these entities using a novel ConceptSearch to derive storylines in a distributed fashion utilizing key-value pair paradigm. Performing these operations at scale allows deeper and broader analysis of storylines. The novel parallelization techniques speed up the generation and filtering of storylines on massive datasets. Experiments with microblog posts such as Twitter data and GDELT(Global Database of Events, Language and Tone) events show the efficiency of the techniques in DISCRN. The second work determines brand perception directly from people's comments in social media. Current techniques for determining brand perception, such as surveys of handpicked users by mail, in person, phone or online, are time consuming and increasingly inadequate. The proposed DERIV system distills storylines from open data representing direct consumer voice into a brand perception. The framework summarizes the perception of a brand in comparison to peer brands with in-memory key-value pair based distributed algorithms utilizing supervised machine learning techniques. Experiments performed with open data and models built with storylines of known peer brands show the technique as highly scalable and accurate in capturing brand perception from vast amounts of social data compared to sentiment analysis. The third work performs event categorization and prospect identification in social media. The problem is challenging due to endless amount of information generated daily. In our work, we present DISTL, an event processing and prospect identifying platform. It accepts as input a set of storylines (a sequence of entities and their relationships) and processes them as follows: (1) uses different algorithms (LDA, SVM, information gain, rule sets) to identify themes from storylines; (2) identifies top locations and times in storylines and combines with themes to generate events that are meaningful in a specific scenario for categorizing storylines; and (3) extracts top prospects as people and organizations from data elements contained in storylines. The output comprises sets of events in different categories and storylines under them along with top prospects identified. DISTL utilizes in-memory key-value pair based distributed processing that scales to high data volumes and categorizes generated storylines in near real-time. The fourth work builds flight paths of drones in a distributed manner to survey a large area taking images to determine growth of vegetation over power lines allowing for adjustment to terrain and number of drones and their capabilities. Drones are increasingly being used to perform risky and labor intensive aerial tasks cheaply and safely. To ensure operating costs are low and flights autonomous, their flight plans must be pre-built. In existing techniques drone flight paths are not automatically pre-calculated based on drone capabilities and terrain information. We present details of an automated flight plan builder DIMPL that pre-builds flight plans for drones tasked with surveying a large area to take photographs of electric poles to identify ones with hazardous vegetation overgrowth. DIMPL employs a distributed in-memory key-value pair based paradigm to process subregions in parallel and build flight paths in a highly efficient manner. The fifth work highlights scaling graph operations, particularly pruning and joins. Linking topics to specific experts in technical documents and finding connections between experts are crucial for detecting the evolution of emerging topics and the relationships between their influencers in state-of-the-art research. Current techniques that make such connections are limited to similarity measures. Methods based on weights such as TF-IDF and frequency to identify important topics and self joins between topics and experts are generally utilized to identify connections between experts. However, such approaches are inadequate for identifying emerging keywords and experts since the most useful terms in technical documents tend to be infrequent and concentrated in just a few documents. This makes connecting experts through joins on large dense graphs challenging. We present DIGDUG, a framework that identifies emerging topics by applying graph operations to technical terms. The framework identifies connections between authors of patents and journal papers by performing joins on connected topics and topics associated with the authors at scale. The problem of scaling the graph operations for topics and experts is solved through dense graph pruning and graph joins categorized under their own scalable separable dense graph class based on key-value pair distribution. Comparing our graph join and pruning technique against multiple graph and join methods in MapReduce revealed a significant improvement in performance using our approach.
- Algorithms and Simulation Framework for Residential Demand ResponseAdhikari, Rajendra (Virginia Tech, 2019-02-11)An electric power system is a complex network consisting of a large number of power generators and consumers interconnected by transmission and distribution lines. One remarkable thing about the electric grid is that there has to be a continuous balance between the amount of electricity generated and consumed at all times. Maintaining this balance is critical for the stable operation of the grid and this task is achieved in the long term, short term and real-time by operating a three-tier wholesale electricity market consisting of the capacity market, the energy market and the ancillary services market respectively. For a demand resource to participate in the energy and the capacity markets, it needs to be able to reduce the power consumption on-demand, whereas to participate in the ancillary services market, the power consumption of the demand resource needs to be varied continuously following the regulation signal sent by the grid operator. This act of changing the demand to help maintain energy balance is called demand response (DR). The dissertation presents novel algorithms and tools to enable residential buildings to participate as demand resources on such markets to provide DR. Residential sector consumes 37% of the total U.S. electricity consumption and a recent consumer survey showed that 88% of consumers are either eager or supportive of advanced technologies for energy efficiency, including demand response. This indicates that residential sector is a very good target for DR. Two broad solutions for residential DR are presented. The first is a set of efficient algorithms that intelligently controls the customers' heating, ventilating and air conditioning (HVAC) devices for providing DR services to the grid. The second solution is an extensible residential demand response simulation framework that can help evaluate and experiment with different residential demand response algorithms. One of the algorithms presented in this dissertation is to reduce the aggregated demand of a set of HVACs during a DR event while respecting the customers' comfort requirements. The algorithm is shown to be efficient, simple to implement and is proven to be optimal. The second algorithm helps provide the regulation DR while honoring customer comfort requirements. The algorithm is efficient, simple to implement and is shown to perform well in a range of real-world situations. A case study is presented estimating the monetary benefit that can be obtained by implementing the algorithm in a cluster of 100 typical homes and shows promising result. Finally, the dissertation presents the design of a python-based object-oriented residential DR simulation framework which is easy to extend as needed. The framework supports simulation of thermal dynamics of a residential building and supports house hold appliances such as HVAC, water heater, clothes washer/dryer and dish washer. A case study showing the application of the simulation framework for various DR implementation is presented, which shows that the simulation framework performs well and can be a useful tool for future research in residential DR.
- Algorithms for Feature Selection in Rank-Order SpacesSlotta, Douglas J.; Vergara, John Paul C.; Ramakrishnan, Naren; Heath, Lenwood S. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2005)The problem of feature selection in supervised learning situations is considered, where all features are drawn from a common domain and are best interpreted via ordinal comparisons with other features, rather than as numerical values. In particular, each instance is a member of a space of ranked features. This problem is pertinent in electoral, financial, and bioinformatics contexts, where features denote assessments in terms of counts, ratings, or rankings. Four algorithms for feature selection in such rank-order spaces are presented; two are information-theoretic, and two are order-theoretic. These algorithms are empirically evaluated against both synthetic and real world datasets. The main results of this paper are (i) characterization of relationships and equivalences between different feature selection strategies with respect to the spaces in which they operate, and the distributions they seek to approximate; (ii) identification of computationally simple and efficient strategies that perform surprisingly well; and (iii) a feasibility study of order-theoretic feature selection for large scale datasets.
- Algorithms for Modeling Mass Movements and their Adoption in Social NetworksJin, Fang (Virginia Tech, 2016-08-23)Online social networks have become a staging ground for many modern movements, with the Arab Spring being the most prominent example. In an effort to understand and predict those movements, social media can be regarded as a valuable social sensor for disclosing underlying behaviors and patterns. To fully understand mass movement information propagation patterns in social networks, several problems need to be considered and addressed. Specifically, modeling mass movements that incorporate multiple spaces, a dynamic network structure, and misinformation propagation, can be exceptionally useful in understanding information propagation in social media. This dissertation explores four research problems underlying efforts to identify and track the adoption of mass movements in social media. First, how do mass movements become mobilized on Twitter, especially in a specific geographic area? Second, can we detect protest activity in social networks by observing group anomalies in graph? Third, how can we distinguish real movements from rumors or misinformation campaigns? and fourth, how can we infer the indicators of a specific type of protest, say climate related protest? A fundamental objective of this research has been to conduct a comprehensive study of how mass movement adoption functions in social networks. For example, it may cross multiple spaces, evolve with dynamic network structures, or consist of swift outbreaks or long term slowly evolving transmissions. In many cases, it may also be mixed with misinformation campaigns, either deliberate or in the form of rumors. Each of those issues requires the development of new mathematical models and algorithmic approaches such as those explored here. This work aims to facilitate advances in information propagation, group anomaly detection and misinformation distinction and, ultimately, help improve our understanding of mass movements and their adoption in social networks.
- Algorithms for Reconstructing and Reasoning about Chemical Reaction NetworksCho, Yong Ju (Virginia Tech, 2013-01-24)Recent advances in systems biology have uncovered detailed mechanisms of biological processes such as the cell cycle, circadian rhythms, and signaling pathways. These mechanisms are modeled by chemical reaction networks (CRNs) which are typically simulated by converting to ordinary differential equations (ODEs), so that the goal is to closely reproduce the observed quantitative and qualitative behaviors of the modeled process. This thesis proposes two algorithmic problems related to the construction and comprehension of CRN models. The first problem focuses on reconstructing CRNs from given time series. Given multivariate time course data obtained by perturbing a given CRN, how can we systematically deduce the interconnections between the species of the network? We demonstrate how this problem can be modeled as, first, one of uncovering conditional independence relationships using buffering experiments and, second, of determining the properties of the individual chemical reactions. Experimental results demonstrate the effectiveness of our approach on both synthetic and real CRNs. The second problem this work focuses on is to aid in network comprehension, i.e., to understand the motifs underlying complex dynamical behaviors of CRNs. Specifically, we focus on bistability---an important dynamical property of a CRN---and propose algorithms to identify the core structures responsible for conferring bistability. The approach we take is to systematically infer the instability causing structures (ICSs) of a CRN and use machine learning techniques to relate properties of the CRN to the presence of such ICSs. This work has the potential to aid in not just network comprehension but also model simplification, by helping reduce the complexity of known bistable systems.
- Algorithms for StorytellingKumar, Deept; Ramakrishnan, Naren; Helm, Richard F.; Potts, Malcolm (Department of Computer Science, Virginia Polytechnic Institute & State University, 2006)We formulate a new data mining problem called "storytelling" as a generalization of redescription mining. In traditional redescription mining, we are given a set of objects and a collection of subsets defined over these objects. The goal is to view the set system as a vocabulary and identify two expressions in this vocabulary that induce the same set of objects. Storytelling, on the other hand, aims to explicitly relate object sets that are disjoint (and hence, maximally dissimilar) by finding a chain of (approximate) redescriptions between the sets. This problem finds applications in bioinformatics, for instance, where the biologist is trying to relate a set of genes expressed in one experiment to another set, implicated in a different pathway. We outline an efficient storytelling implementation that embeds the CARTwheels redescription mining algorithm in an A* search procedure, using the former to supply next move operators on search branches to the latter. This approach is practical and effective for mining large datasets and, at the same time, exploits the structure of partitions imposed by the given vocabulary. Three application case studies are presented: a study of word overlaps in large English dictionaries, exploring connections between genesets in a bioinformatics dataset, and relating publications in the PubMed index of abstracts.
- Application of Deep Learning in Intelligent Transportation SystemsDabiri, Sina (Virginia Tech, 2019-02-01)The rapid growth of population and the permanent increase in the number of vehicles engender several issues in transportation systems, which in turn call for an intelligent and cost-effective approach to resolve the problems in an efficient manner. A cost-effective approach for improving and optimizing transportation-related problems is to unlock hidden knowledge in ever-increasing spatiotemporal and crowdsourced information collected from various sources such as mobile phone sensors (e.g., GPS sensors) and social media networks (e.g., Twitter). Data mining and machine learning techniques are the major tools for analyzing the collected data and extracting useful knowledge on traffic conditions and mobility behaviors. Deep learning is an advanced branch of machine learning that has enjoyed a lot of success in computer vision and natural language processing fields in recent years. However, deep learning techniques have been applied to only a small number of transportation applications such as traffic flow and speed prediction. Accordingly, my main objective in this dissertation is to develop state-of-the-art deep learning architectures for resolving the transport-related applications that have not been treated by deep learning architectures in much detail, including (1) travel mode detection, (2) vehicle classification, and (3) traffic information system. To this end, an efficient representation for spatiotemporal and crowdsourced data (e.g., GPS trajectories) is also required to be designed in such a way that not only be adaptable with deep learning architectures but also contains efficient information for solving the task-at-hand. Furthermore, since the good performance of a deep learning algorithm is primarily contingent on access to a large volume of training samples, efficient data collection and labeling strategies are developed for different data types and applications. Finally, the performance of the proposed representations and models are evaluated by comparing to several state-of-the-art techniques in literature. The experimental results clearly and consistently demonstrate the superiority of the proposed deep-learning based framework for each application.
- An Approach to Using Cognition in Wireless NetworksMorales-Tirado, Lizdabel (Virginia Tech, 2009-12-18)Third Generation (3G) wireless networks have been well studied and optimized with traditional radio resource management techniques, but still there is room for improvement. Cognitive radio technology can bring significantcant network improvements by providing awareness to the surrounding radio environment, exploiting previous network knowledge and optimizing the use of resources using machine learning and artificial intelligence techniques. Cognitive radio can also co-exist with legacy equipment thus acting as a bridge among heterogeneous communication systems. In this work, an approach for applying cognition in wireless networks is presented. Also, two machine learning techniques are used to create a hybrid cognitive engine. Furthermore, the concept of cognitive radio resource management along with some of the network applications are discussed. To evaluate the proposed approach cognition is applied to three typical wireless network problems: improving coverage, handover management and determining recurring policy events. A cognitive engine, that uses case-based reasoning and a decision tree algorithm is developed. The engine learns the coverage of a cell solely from observations, predicts when a handover is necessary and determines policy patterns, solely from environment observations.
- Augmenting Dynamic Query Expansion in Microblog TextsKhandpur, Rupinder P. (Virginia Tech, 2018-08-17)Dynamic query expansion is a method of automatically identifying terms relevant to a target domain based on an incomplete query input. With the explosive growth of online media, such tools are essential for efficient search result refining to track emerging themes in noisy, unstructured text streams. It's crucial for large-scale predictive analytics and decision-making, systems which use open source indicators to find meaningful information rapidly and accurately. The problems of information overload and semantic mismatch are systemic during the Information Retrieval (IR) tasks undertaken by such systems. In this dissertation, we develop approaches to dynamic query expansion algorithms that can help improve the efficacy of such systems using only a small set of seed queries and requires no training or labeled samples. We primarily investigate four significant problems related to the retrieval and assessment of event-related information, viz. (1) How can we adapt the query expansion process to support rank-based analysis when tracking a fixed set of entities? A scalable framework is essential to allow relative assessment of emerging themes such as airport threats. (2) What visual knowledge discovery framework to adopt that can incorporate users' feedback back into the search result refinement process? A crucial step to efficiently integrate real-time `situational awareness' when monitoring specific themes using open source indicators. (3) How can we contextualize query expansions? We focus on capturing semantic relatedness between a query and reference text so that it can quickly adapt to different target domains. (4) How can we synchronously perform knowledge discovery and characterization (unstructured to structured) during the retrieval process? We mainly aim to model high-order, relational aspects of event-related information from microblog texts.
- Automated Vocabulary Building for Characterizing and Forecasting Elections using Social Media AnalyticsMahendiran, Aravindan (Virginia Tech, 2014-02-12)Twitter has become a popular data source in the recent decade and garnered a significant amount of attention as a surrogate data source for many important forecasting problems. Strong correlations have been observed between Twitter indicators and real-world trends spanning elections, stock markets, book sales, and flu outbreaks. A key ingredient to all methods that use Twitter for forecasting is to agree on a domain-specific vocabulary to track the pertinent tweets, which is typically provided by subject matter experts (SMEs). The language used in Twitter drastically differs from other forms of online discourse, such as news articles and blogs. It constantly evolves over time as users adopt popular hashtags to express their opinions. Thus, the vocabulary used by forecasting algorithms needs to be dynamic in nature and should capture emerging trends over time. This thesis proposes a novel unsupervised learning algorithm that builds a dynamic vocabulary using Probabilistic Soft Logic (PSL), a framework for probabilistic reasoning over relational domains. Using eight presidential elections from Latin America, we show how our query expansion methodology improves the performance of traditional election forecasting algorithms. Through this approach we demonstrate how we can achieve close to a two-fold increase in the number of tweets retrieved for predictions and a 36.90% reduction in prediction error.
- Automatic Identification of Topic Tags from Texts Based on Expansion-Extraction ApproachYang, Seungwon (Virginia Tech, 2014-01-22)Identifying topics of a textual document is useful for many purposes. We can organize the documents by topics in digital libraries. Then, we could browse and search for the documents with specific topics. By examining the topics of a document, we can quickly understand what the document is about. To augment the traditional manual way of topic tagging tasks, which is labor-intensive, solutions using computers have been developed. This dissertation describes the design and development of a topic identification approach, in this case applied to disaster events. In a sense, this study represents the marriage of research analysis with an engineering effort in that it combines inspiration from Cognitive Informatics with a practical model from Information Retrieval. One of the design constraints, however, is that the Web was used as a universal knowledge source, which was essential in accessing the required information for inferring topics from texts. Retrieving specific information of interest from such a vast information source was achieved by querying a search engine's application programming interface. Specifically, the information gathered was processed mainly by incorporating the Vector Space Model from the Information Retrieval field. As a proof of concept, we subsequently developed and evaluated a prototype tool, Xpantrac, which is able to run in a batch mode to automatically process text documents. A user interface of Xpantrac also was constructed to support an interactive semi-automatic topic tagging application, which was subsequently assessed via a usability study. Throughout the design, development, and evaluation of these various study components, we detail how the hypotheses and research questions of this dissertation have been supported and answered. We also present that our overarching goal, which was the identification of topics in a human-comparable way without depending on a large training set or a corpus, has been achieved.
- Automatic Reconstruction of the Building Blocks of Molecular Interaction NetworksRivera, Corban G. (Virginia Tech, 2008-08-11)High-throughput whole-genome biological assays are highly intricate and difficult to interpret. The molecular interaction networks generated from evaluation of those experiments suggest that cellular functions are carried out by modules of interacting molecules. Reverse-engineering the modular structure of cellular interaction networks has the promise of significantly easing their analysis. We hypothesize that: • cellular wiring diagrams can be decomposed into overlapping modules, where each module is a set of coherently-interacting molecules and • a cell responds to a stress or a stimulus by appropriately modulating the activities of a subset of these modules. Motivated by these hypotheses, we develop models and algorithms that can reverse-engineer molecular modules from large-scale functional genomic data. We address two major problems: 1. Given a wiring diagram and genome-wide gene expression data measured after the application of a stress or in a disease state, compute the active network of molecular interactions perturbed by the stress or the disease. 2. Given the active networks for multiple stresses, stimuli, or diseases, compute a set of network legos, which are molecular modules with the property that each active network can be expressed as an appropriate combination of a subset of modules. To address the first problem, we propose an approach that computes the most-perturbed subgraph of a curated pathway of molecular interactions in a disease state. Our method is based on a novel score for pathway perturbation that incorporates both differential gene expression and the interaction structure of the pathway. We apply our method to a compendium of cancer types. We show that the significance of the most perturbed sub-pathway is frequently larger than that of the entire pathway. We identify an association that suggests that IL-2 infusion may have a similar therapeutic effect in bladder cancer as it does in melanoma. We propose two models to address the second problem. First, we formulate a Boolean model for constructing network legos from a set of active networks. We reduce the problem of computing network legos to that of constructing closed biclusters in a binary matrix. Applying this method to a compendium of 13 stresses on human cells, we automatically detect that about four to six hours after treatment with chemicals cause endoplasmic reticulum stress, fibroblasts shut down the cell cycle far more aggressively than fibroblasts or HeLa cells do in response to other treatments. Our second model represents each active network as an additive combination of network legos. We formulate the problem as one of computing network legos that can be used to recover active networks in an optimal manner. We use existing methods for non-negative matrix approximation to solve this problem. We apply our method to a human cancer dataset including 190 samples from 18 cancers. We identify a network lego that associates integrins and matrix metalloproteinases in ovarian adenoma and other cancers and a network lego including the retinoblastoma pathway associated with multiple leukemias.
- ‘Beating the news’ with EMBERS: Forecasting Civil Unrest using Open Source IndicatorsRamakrishnan, Naren; Butler, Patrick; Self, Nathan; Khandpur, Rupinder P.; Saraf, Parang; Wang, Wei; Cadena, Jose; Vullikanti, Anil Kumar S.; Korkmaz, Gizem; Kuhlman, Christopher J.; Marathe, Achla; Zhao, Liang; Ting, Hua; Huang, Bert; Srinivasan, Aravind; Trinh, Khoa; Getoor, Lise; Katz, Graham; Doyle, Andy; Ackermann, Chris; Zavorin, Ilya; Ford, Jim; Summers, Kristen; Fayed, Youssef; Arredondo, Jaime; Gupta, Dipak; Mares, David; Muthia, Sathappan; Chen, Feng; Lu, Chang-Tien (2014)We describe the design, implementation, and evaluation of EMBERS, an automated, 24x7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and continue to be) evaluated by an independent T&E team (MITRE). Of note, EMBERS has successfully forecast the uptick and downtick of incidents during the June 2013 protests in Brazil. We outline the system architecture of EMBERS, individual models that leverage specific data sources, and a fusion and suppression engine that supports trading off specific evaluation criteria. EMBERS also provides an audit trail interface that enables the investigation of why specific predictions were made along with the data utilized for forecasting. Through numerous evaluations, we demonstrate the superiority of EMBERS over baserate methods and its capability to forecast significant societal happenings.
- Behavior Modeling and Analytics for Urban Computing: A Synthetic Information-based ApproachParikh, Nidhi Kiranbhai (Virginia Tech, 2017-03-15)The rapid increase in urbanization poses challenges in diverse areas such as energy, transportation, pandemic planning, and disaster response. Planning for urbanization is a big challenge because cities are complex systems consisting of human populations, infrastructures, and interactions and interdependence among them. This dissertation focuses on a synthetic information-based approach for modeling human activities and behaviors for two urban science applications, epidemiology and disaster planning, and with associated analytics. Synthetic information is a data-driven approach to create a detailed, high fidelity representation of human populations, infrastructural systems and their behavioral and interaction aspects. It is used in developing large-scale simulations to model what-if scenarios and for policy making. Big cities have a large number of visitors visiting them every day. They often visit crowded areas in the city and come into contact with each other and the area residents. However, most epidemiological studies have ignored their role in spreading epidemics. We extend the synthetic population model of the Washington DC metro area to include transient populations, consisting of tourists and business travelers, along with their demographics and activities, by combining data from multiple sources. We evaluate the effect of including this population in epidemic forecasts, and the potential benefits of multiple interventions that target transients. In the next study, we model human behavior in the aftermath of the detonation of an improvised nuclear device in Washington DC. Previous studies of this scenario have mostly focused on modeling physical impact and simple behaviors like sheltering and evacuation. However, these models have focused on optimal behavior, not naturalistic behavior. In other words, prior work is focused on whether it is better to shelter-in-place or evacuate, but has not been informed by the literature on what people actually do in the aftermath of disasters. Natural human behaviors in disasters, such as looking for family members or seeking healthcare, are supported by infrastructures such as cell-phone communication and transportation systems. We model a range of behaviors such as looking for family members, evacuation, sheltering, healthcare-seeking, worry, and search and rescue and their interactions with infrastructural systems. Large-scale and complex agent-based simulations generate a large amount of data in each run of the simulation, making it hard to make sense of results. This leads us to formulate two new problems in simulation analytics. First, we develop algorithms to summarize simulation results by extracting causally-relevant state sequences - state sequences that have a measurable effect on the outcome of interest. Second, in order to develop effective interventions, it is important to understand which behaviors lead to positive and negative outcomes. It may happen that the same behavior may lead to different outcomes, depending upon the context. Hence, we develop an algorithm for contextual behavior ranking. In addition to the context mentioned in the query, our algorithm also identifies any additional context that may affect the behavioral ranking.
- Bridging Methodological Gaps in Network-Based Systems BiologyPoirel, Christopher L. (Virginia Tech, 2013-10-16)Functioning of the living cell is controlled by a complex network of interactions among genes, proteins, and other molecules. A major goal of systems biology is to understand and explain the mechanisms by which these interactions govern the cell's response to various conditions. Molecular interaction networks have proven to be a powerful representation for studying cellular behavior. Numerous algorithms have been developed to unravel the complexity of these networks. Our work addresses the drawbacks of existing techniques. This thesis includes three related research efforts that introduce network-based approaches to bridge current methodological gaps in systems biology. i. Functional enrichment methods provide a summary of biological functions that are overrepresented in an interesting collection of genes (e.g., highly differentially expressed genes between a diseased cell and a healthy cell). Standard functional enrichment algorithms ignore the known interactions among proteins. We propose a novel network-based approach to functional enrichment that explicitly accounts for these underlying molecular interactions. Through this work, we close the gap between set-based functional enrichment and topological analysis of molecular interaction networks. ii. Many techniques have been developed to compute the response network of a cell. A recent trend in this area is to compute response networks of small size, with the rationale that only part of a pathway is often changed by disease and that interpreting small subnetworks is easier than interpreting larger ones. However, these methods may not uncover the spectrum of pathways perturbed in a particular experiment or disease. To avoid these difficulties, we propose to use algorithms that reconcile case-control DNA microarray data with a molecular interaction network by modifying per-gene differential expression p-values such that two genes connected by an interaction show similar changes in their gene expression values. iii. Top-down analyses in systems biology can automatically find correlations among genes and proteins in large-scale datasets. However, it is often difficult to design experiments from these results. In contrast, bottom-up approaches painstakingly craft detailed models of cellular processes. However, developing the models is a manual process that can take many years. These approaches have largely been developed independently. We present Linker, an efficient and automated data-driven method that analyzes molecular interactomes. Linker combines teleporting random walks and k-shortest path computations to discover connections from a set of source proteins to a set of target proteins. We demonstrate the efficacy of Linker through two applications: proposing extensions to an existing model of cell cycle regulation in budding yeast and automated reconstruction of human signaling pathways. Linker achieves superior precision and recall compared to state-of-the-art algorithms from the literature.
- BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving EnvironmentsVerstak, Alex; Ramakrishnan, Naren; Watson, Layne T.; He, Jian; Shaffer, Clifford A.; Bae, Kyung Kyoon; Jiang, Jing; Tranter, William H.; Rappaport, Theodore S. (Hindawi, 2003-01-01)We describe a binding schema markup language (BSML) for describing data interchange between scientific codes. Such a facility is an important constituent of scientific problem solving environments (PSEs). BSML is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data. The data interchange problem is addressed by three techniques for processing semistructured data: validation, binding, and conversion. We present BSML and describe its application to a PSE for wireless communications system design.