Browsing by Author "Cadena, Jose"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- ‘Beating the news’ with EMBERS: Forecasting Civil Unrest using Open Source IndicatorsRamakrishnan, Naren; Butler, Patrick; Self, Nathan; Khandpur, Rupinder P.; Saraf, Parang; Wang, Wei; Cadena, Jose; Vullikanti, Anil Kumar S.; Korkmaz, Gizem; Kuhlman, Christopher J.; Marathe, Achla; Zhao, Liang; Ting, Hua; Huang, Bert; Srinivasan, Aravind; Trinh, Khoa; Getoor, Lise; Katz, Graham; Doyle, Andy; Ackermann, Chris; Zavorin, Ilya; Ford, Jim; Summers, Kristen; Fayed, Youssef; Arredondo, Jaime; Gupta, Dipak; Mares, David; Muthia, Sathappan; Chen, Feng; Lu, Chang-Tien (2014)We describe the design, implementation, and evaluation of EMBERS, an automated, 24x7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and continue to be) evaluated by an independent T&E team (MITRE). Of note, EMBERS has successfully forecast the uptick and downtick of incidents during the June 2013 protests in Brazil. We outline the system architecture of EMBERS, individual models that leverage specific data sources, and a fusion and suppression engine that supports trading off specific evaluation criteria. EMBERS also provides an audit trail interface that enables the investigation of why specific predictions were made along with the data utilized for forecasting. Through numerous evaluations, we demonstrate the superiority of EMBERS over baserate methods and its capability to forecast significant societal happenings.
- Discovery of under immunized spatial clusters using network scan statisticsCadena, Jose; Falcone, David; Marathe, Achla; Vullikanti, Anil (2019-02-04)Background Clusters of under-vaccinated children are emerging in a number of states in the United States due to rising rates of vaccine hesitancy and refusal. As the measles outbreaks in California and other states in 2015 and in Minnesota in 2017 showed, such clusters can pose a significant public health risk. Prior methods have used publicly-available school immunization data for analysis (except for a few, which use private healthcare patient records). School immunization data has limited demographic information—as a result, such analyses are not able to provide demographic characteristics of significant clusters. Further, the resolution of the clusters identified by prior methods is limited since they are typically restricted to disks or well-rounded shapes. Methods We use realistic population models for Minnesota (MN) and Washington (WA) state, which provide a model of activities for all individuals in the population. We combine this with school level immunization data for these two states, to estimate vaccine coverage at the level of census block groups. A scan statistic method defined on networks is used for finding significant clusters of under-immunized block groups, without any restrictions on shape. Further we provide the demographic characteristics of these clusters. Results We find 2 significant under-vaccinated clusters in MN and 3 in WA. These are very irregular in shape, in contrast to the circular disks reported in prior work, which rely on the SatScan approach. Some of the clusters found by our method are not contained in those computed using SatScan, a state-of-the-art software tool used in similar studies in other states. Conclusions The emergence of under-immunized clusters is a growing concern for public health agencies because they can act as reservoirs of infection and increase the risk of infection into the wider population. Higher resolution clusters computed using our network based approach and population models provide new insights on the structure and characteristics of such clusters and enable targeted interventions.
- Finding Interesting Subgraphs with GuaranteesCadena, Jose (Virginia Tech, 2018-01-29)Networks are a mathematical abstraction of the interactions between a set of entities, with extensive applications in social science, epidemiology, bioinformatics, and cybersecurity, among others. There are many fundamental problems when analyzing network data, such as anomaly detection, dense subgraph mining, motif finding, information diffusion, and epidemic spread. A common underlying task in all these problems is finding an "interesting subgraph"; that is, finding a part of the graph---usually small relative to the whole---that optimizes a score function and has some property of interest, such as connectivity or a minimum density. Finding subgraphs that satisfy common constraints of interest, such as the ones above, is computationally hard in general, and state-of-the-art algorithms for many problems in network analysis are heuristic in nature. These methods are fast and usually easy to implement. However, they come with no theoretical guarantees on the quality of the solution, which makes it difficult to assess how the discovered subgraphs compare to an optimal solution, which in turn affects the data mining task at hand. For instance, in anomaly detection, solutions with low anomaly score lead to sub-optimal detection power. On the other end of the spectrum, there have been significant advances on approximation algorithms for these challenging graph problems in the theoretical computer science community. However, these algorithms tend to be slow, difficult to implement, and they do not scale to the large datasets that are common nowadays. The goal of this dissertation is developing scalable algorithms with theoretical guarantees for various network analysis problems, where the underlying task is to find subgraphs with constraints. We find interesting subgraphs with guarantees by adapting techniques from parameterized complexity, convex optimization, and submodularity optimization. These techniques are well-known in the algorithm design literature, but they lead to slow and impractical algorithms. One unifying theme in the problems that we study is that our methods are scalable without sacrificing the theoretical guarantees of these algorithm design techniques. We accomplish this combination of scalability and rigorous bounds by exploiting properties of the problems we are trying to optimize, decomposing or compressing the input graph to a manageable size, and parallelization. We consider problems on network analysis for both static and dynamic network models. And we illustrate the power of our methods in applications, such as public health, sensor data analysis, and event detection using social media data.
- Forecasting Social Unrest Using Activity CascadesCadena, Jose; Korkmaz, Gizem; Kuhlman, Christopher J.; Marathe, Achla; Ramakrishnan, Naren; Vullikanti, Anil (PLOS, 2015-06-19)Social unrest is endemic in many societies, and recent news has drawn attention to happenings in Latin America, the Middle East, and Eastern Europe. Civilian populations mobilize, sometimes spontaneously and sometimes in an organized manner, to raise awareness of key issues or to demand changes in governing or other organizational structures. It is of key interest to social scientists and policy makers to forecast civil unrest using indicators observed on media such as Twitter, news, and blogs. We present an event forecasting model using a notion of activity cascades in Twitter (proposed by Gonzalez-Bailon et al., 2011) to predict the occurrence of protests in three countries of Latin America: Brazil, Mexico, and Venezuela. The basic assumption is that the emergence of a suitably detected activity cascade is a precursor or a surrogate to a real protest event that will happen “on the ground.” Our model supports the theoretical characterization of large cascades using spectral properties and uses properties of detected cascades to forecast events. Experimental results on many datasets, including the recent June 2013 protests in Brazil, demonstrate the effectiveness of our approach.
- Hadoop Project for IDEAL in CS5604Cadena, Jose; Chen, Mengsu; Wen, Chengyuan (Virginia Tech, 2015-05-11)The Integrated Digital Event Archive and Library (IDEAL) system addresses the need for combining the best of digital library and archive technologies in support of stakeholders who are remembering and/or studying important events. It leverages and extends the capabilities of the Internet Archive to develop spontaneous event collections that can be permanently archived as well as searched and accessed. IDEAL connects the processing of tweets and web pages, combining informal and formal media to support building collections on chosen general or specific events. Integrated services include topic identification, categorization (building upon special ontologies being devised), clustering, and visualization of data, information, and context. The objective for the course is to build a state-of-the-art information retrieval system in support of the IDEAL project. Students were assigned to eight teams, each of which focused on a different part of the system to be built. These teams were Solr, Classification, Hadoop, Noise Reduction, LDA, Clustering, Social Networks, and NER. As the Hadoop team, our focus is on making the information retrieval system scalable to large datasets by taking advantage of the distributed computing capabilities of the Apache Hadoop framework. We design and put in place a general schema for storing and updating data stored in our Hadoop cluster. Throughout the project, we coordinate with other teams to help them make use of readily available machine learning software for Hadoop, and we also provide support for using MapReduce. We found that different teams were able to easily integrate their results in the design we developed and that uploading these results into a data store for communication with Solr can be done, in the best cases, in a few minutes. We conclude that Hadoop is an appropriate framework for the IDEAL project; however, we also recommend exploring the use of the Spark framework.