Modeling and Computation of Complex Interventions in Large-scale Epidemiological Simulations using SQL and Distributed Database

dc.contributor.authorKaw, Rushien
dc.contributor.committeechairMarathe, Madhav Vishnuen
dc.contributor.committeememberGupta, Sandeepen
dc.contributor.committeememberPrakash, B. Adityaen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2014-08-31T08:00:23Zen
dc.date.available2014-08-31T08:00:23Zen
dc.date.issued2014-08-30en
dc.description.abstractScalability is an important problem in epidemiological applications that simulate complex intervention scenarios over large datasets. Indemics is one such interactive data intensive framework for High-performance computing (HPC) based large-scale epidemic simulations. In the Indemics framework, interventions are supplied from an external, standalone database which proved to be an effective way of implementing interventions. Although this setup performs well for simple interventions and small datasets, performance and scalability of complex interventions and large datasets remain an issue. In this thesis, we present IndemicsXC, a scalable and massively parallel high-performance data engine for Indemics in a supercomputing environment. IndemicsXC has the ability to implement complex interventions over large datasets. Our distributed database solution retains the simplicity of Indemics by using the same SQL query interface for expressing interventions. We show that our solution implements the most complex interventions by intelligently offloading them to the supercomputer nodes and processing them in parallel. We present an extensive performance evaluation of our database engine with the help of various intervention case studies over synthetic population datasets. The evaluation of our parallel and distributed database framework illustrates its scalability over standalone database. Our results show that the distributed data engine is efficient as it is parallel, scalable and cost-efficient means of implementing interventions. The proposed cost-model in this thesis could be used to approximate intervention query execution time with decent accuracy. The usefulness of our distributed database framework could be leveraged for fast, accurate and sensible decisions by the public health officials during an outbreak. Finally, we discuss the considerations for using distributed databases for driving large-scale simulations.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:3630en
dc.identifier.urihttp://hdl.handle.net/10919/50434en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectepidemic simulationen
dc.subjectdistributed systemen
dc.subjectdatabase systemen
dc.titleModeling and Computation of Complex Interventions in Large-scale Epidemiological Simulations using SQL and Distributed Databaseen
dc.typeThesisen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Kaw_R_T_2014.pdf
Size:
547.58 KB
Format:
Adobe Portable Document Format
Name:
Kaw_R_T_2014_support_1.docx
Size:
9.76 KB
Format:
Microsoft Word XML
Description:
Supporting documents

Collections