Show simple item record

dc.contributor.authorKhan, Mohammed Saquib Akmalen_US
dc.date.accessioned2015-01-27T09:00:14Z
dc.date.available2015-01-27T09:00:14Z
dc.date.issued2015-01-26en_US
dc.identifier.othervt_gsexam:4255en_US
dc.identifier.urihttp://hdl.handle.net/10919/51223
dc.description.abstractReal-time Spatio-Temporal Analytics has become an integral part of Epidemiological studies. The size of the spatio-temporal data has been increasing tremendously over the years, gradually evolving into Big Data. The processing in such domains are highly data and compute intensive. High performance computing resources resources are actively being used to handle such workloads over massive datasets. This confluence of High performance computing and datasets with Big Data characteristics poses great challenges pertaining to data handling and processing. The resource management of supercomputers is in conflict with the data-intensive nature of spatio-temporal analytics. This is further exacerbated due to the fact that the data management is decoupled from the computing resources. Problems of these nature has provided great opportunities in the growth and development of tools and concepts centered around MapReduce based solutions. However, we believe that advanced relational concepts can still be employed to provide an effective solution to handle these issues and challenges. In this study, we explore distributed databases to efficiently handle spatio-temporal Big Data for epidemiological studies. We propose DiceX (Data Intensive Computational Epidemiology using supercomputers), which couples high-performance, Big Data and relational computing by embedding distributed data storage and processing engines within the supercomputer. It is characterized by scalable strategies for data ingestion, unified framework to setup and configure various processing engines, along with the ability to pause, materialize and restore images of a data session. In addition, we have successfully configured DiceX to support approximation algorithms from MADlib Analytics Library [54], primarily Count-Min Sketch or CM Sketch [33][34][35]. DiceX enables a new style of Big Data processing, which is centered around the use of clustered databases and exploits supercomputing resources. It can effectively exploit the cores, memory and compute nodes of supercomputers to scale processing of spatio-temporal queries on datasets of large volume. Thus, it provides a scalable and efficient tool for data management and processing of spatio-temporal data. Although DiceX has been designed for computational epidemiology, it can be easily extended to different data-intensive domains facing similar issues and challenges. We thank our external collaborators and members of the Network Dynamics and Simulation Science Laboratory (NDSSL) for their suggestions and comments. This work has been partially supported by DTRA CNIMS Contract HDTRA1-11-D-0016-0001, DTRA Validation Grant HDTRA1-11-1-0016, NSF - Network Science and Engineering Grant CNS-1011769, NIH and NIGMS - Models of Infectious Disease Agent Study Grant 5U01GM070694-11. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government.en_US
dc.format.mediumETDen_US
dc.publisherVirginia Techen_US
dc.rightsThis Item is protected by copyright and/or related rights. Some uses of this Item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s).en_US
dc.subjectData Analyticsen_US
dc.subjectData Miningen_US
dc.subjectDistributed Systemsen_US
dc.subjectDatabase Systemsen_US
dc.titleEfficient Spatio-Temporal Network Analytics in Epidemiological Studies using Distributed Databasesen_US
dc.typeThesisen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreeMSen_US
thesis.degree.nameMSen_US
thesis.degree.levelmastersen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Science and Applicationsen_US
dc.contributor.committeechairMarathe, Madhav Vishnuen_US
dc.contributor.committeememberVullikanti, Anil Kumar Sen_US
dc.contributor.committeememberPrakash, Bodicherla Adityaen_US
dc.contributor.committeememberGupta, Sandeepen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record