Virginia Tech
    • Log in
    View Item 
    •   VTechWorks Home
    • ETDs: Virginia Tech Electronic Theses and Dissertations
    • Masters Theses
    • View Item
    •   VTechWorks Home
    • ETDs: Virginia Tech Electronic Theses and Dissertations
    • Masters Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Modeling and Computation of Complex Interventions in Large-scale Epidemiological Simulations using SQL and Distributed Database

    Thumbnail
    View/Open
    Kaw_R_T_2014.pdf (547.5Kb)
    Downloads: 704
    Supporting documents (9.762Kb)
    Downloads: 310
    Date
    2014-08-30
    Author
    Kaw, Rushi
    Metadata
    Show full item record
    Abstract
    Scalability is an important problem in epidemiological applications that simulate complex intervention scenarios over large datasets. Indemics is one such interactive data intensive framework for High-performance computing (HPC) based large-scale epidemic simulations. In the Indemics framework, interventions are supplied from an external, standalone database which proved to be an effective way of implementing interventions. Although this setup performs well for simple interventions and small datasets, performance and scalability of complex interventions and large datasets remain an issue. In this thesis, we present IndemicsXC, a scalable and massively parallel high-performance data engine for Indemics in a supercomputing environment. IndemicsXC has the ability to implement complex interventions over large datasets. Our distributed database solution retains the simplicity of Indemics by using the same SQL query interface for expressing interventions. We show that our solution implements the most complex interventions by intelligently offloading them to the supercomputer nodes and processing them in parallel. We present an extensive performance evaluation of our database engine with the help of various intervention case studies over synthetic population datasets. The evaluation of our parallel and distributed database framework illustrates its scalability over standalone database. Our results show that the distributed data engine is efficient as it is parallel, scalable and cost-efficient means of implementing interventions. The proposed cost-model in this thesis could be used to approximate intervention query execution time with decent accuracy. The usefulness of our distributed database framework could be leveraged for fast, accurate and sensible decisions by the public health officials during an outbreak. Finally, we discuss the considerations for using distributed databases for driving large-scale simulations.
    URI
    http://hdl.handle.net/10919/50434
    Collections
    • Masters Theses [21540]

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us
     

     

    VTechWorks

    AboutPoliciesHelp

    Browse

    All of VTechWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Log inRegister

    Statistics

    View Usage Statistics

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us