A Distributed Approach to EpiFast using Apache Spark

dc.contributor.authorKannan, Vijayasarathyen
dc.contributor.committeechairMarathe, Madhav Vishnuen
dc.contributor.committeememberMarathe, Achlaen
dc.contributor.committeememberVullikanti, Anil Kumar S.en
dc.contributor.committeememberChen, Jiangzhuoen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2015-08-05T08:00:55Zen
dc.date.available2015-08-05T08:00:55Zen
dc.date.issued2015-08-04en
dc.description.abstractEpiFast is a parallel algorithm for large-scale epidemic simulations, based on an interpretation of the stochastic disease propagation in a contact network. The original EpiFast implementation is based on a master-slave computation model with a focus on distributed memory using message-passing-interface (MPI). However, it suffers from few shortcomings with respect to scale of networks being studied. This thesis addresses these shortcomings and provides two different implementations: Spark-EpiFast based on the Apache Spark big data processing engine and Charm-EpiFast based on the Charm++ parallel programming framework. The study focuses on exploiting features of both systems that we believe could potentially benefit in terms of performance and scalability. We present models of EpiFast specific to each system and relate algorithm specifics to several optimization techniques. We also provide a detailed analysis of these optimizations through a range of experiments that consider scale of networks and environment settings we used. Our analysis shows that the Spark-based version is more efficient than the Charm++ and MPI-based counterparts. To the best of our knowledge, ours is one of the preliminary efforts of using Apache Spark for epidemic simulations. We believe that our proposed model could act as a reference for similar large-scale epidemiological simulations exploring non-MPI or MapReduce-like approaches.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:5928en
dc.identifier.urihttp://hdl.handle.net/10919/55272en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectcomputational epidemiologyen
dc.subjectparallel programmingen
dc.subjectdistributed computingen
dc.titleA Distributed Approach to EpiFast using Apache Sparken
dc.typeThesisen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kannan_V_T_2015.pdf
Size:
2.28 MB
Format:
Adobe Portable Document Format

Collections