A Distributed Approach to EpiFast using Apache Spark
dc.contributor.author | Kannan, Vijayasarathy | en |
dc.contributor.committeechair | Marathe, Madhav Vishnu | en |
dc.contributor.committeemember | Marathe, Achla | en |
dc.contributor.committeemember | Vullikanti, Anil Kumar S. | en |
dc.contributor.committeemember | Chen, Jiangzhuo | en |
dc.contributor.department | Computer Science | en |
dc.date.accessioned | 2015-08-05T08:00:55Z | en |
dc.date.available | 2015-08-05T08:00:55Z | en |
dc.date.issued | 2015-08-04 | en |
dc.description.abstract | EpiFast is a parallel algorithm for large-scale epidemic simulations, based on an interpretation of the stochastic disease propagation in a contact network. The original EpiFast implementation is based on a master-slave computation model with a focus on distributed memory using message-passing-interface (MPI). However, it suffers from few shortcomings with respect to scale of networks being studied. This thesis addresses these shortcomings and provides two different implementations: Spark-EpiFast based on the Apache Spark big data processing engine and Charm-EpiFast based on the Charm++ parallel programming framework. The study focuses on exploiting features of both systems that we believe could potentially benefit in terms of performance and scalability. We present models of EpiFast specific to each system and relate algorithm specifics to several optimization techniques. We also provide a detailed analysis of these optimizations through a range of experiments that consider scale of networks and environment settings we used. Our analysis shows that the Spark-based version is more efficient than the Charm++ and MPI-based counterparts. To the best of our knowledge, ours is one of the preliminary efforts of using Apache Spark for epidemic simulations. We believe that our proposed model could act as a reference for similar large-scale epidemiological simulations exploring non-MPI or MapReduce-like approaches. | en |
dc.description.degree | Master of Science | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:5928 | en |
dc.identifier.uri | http://hdl.handle.net/10919/55272 | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | computational epidemiology | en |
dc.subject | parallel programming | en |
dc.subject | distributed computing | en |
dc.title | A Distributed Approach to EpiFast using Apache Spark | en |
dc.type | Thesis | en |
thesis.degree.discipline | Computer Science and Applications | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | masters | en |
thesis.degree.name | Master of Science | en |
Files
Original bundle
1 - 1 of 1