MPI-ACC: Accelerator-Aware MPI for Scientific Applications

dc.contributor.authorAji, Ashwin M.en
dc.contributor.authorPanwar, Lokendra S.en
dc.contributor.authorJi, Fengen
dc.contributor.authorMurthy, Karthiken
dc.contributor.authorChabbi, Milinden
dc.contributor.authorBalaji, Pavanen
dc.contributor.authorBisset, Keith R.en
dc.contributor.authorDinan, Jamesen
dc.contributor.authorFeng, Wu-chunen
dc.contributor.authorMellor-Crummey, Johnen
dc.contributor.authorMa, Xiaosongen
dc.contributor.authorThakur, Rajeeven
dc.contributor.departmentComputer Scienceen
dc.contributor.departmentFralin Life Sciences Instituteen
dc.date.accessioned2017-03-17T09:01:39Zen
dc.date.available2017-03-17T09:01:39Zen
dc.date.issued2016-05-01en
dc.description.abstractData movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not integrated into such data movement standards, thus providing applications with no direct mechanism to perform end-toend data movement. We introduce MPI-ACC, an integrated and extensible framework that allows end-to-end data movement in accelerator-based systems. MPI-ACC provides productivity and performance benefits by integrating support for auxiliary memory spaces into MPI. MPI-ACC supports data transfer among CUDA, OpenCL and CPU memory spaces and is extensible to other offload models as well. MPI-ACC’s runtime system enables several key optimizations, including pipelining of data transfers, scalable memory management techniques, and balancing of communication based on accelerator and node architecture. MPIACC is designed to work concurrently with other GPU workloads with minimum contention. We describe how MPI-ACC can be used to design new communication-computation patterns in scientific applications from domains such as epidemiology simulation and seismology modeling, and we discuss the lessons learned. We present experimental results on a state-of-the-art cluster with hundreds of GPUs; and we compare the performance and productivity of MPI-ACC with MVAPICH, a popular CUDA-aware MPI solution. MPI-ACC encourages programmers to explore novel application-specific optimizations for improved overall cluster utilization.en
dc.description.versionPublished versionen
dc.format.extent1401 - 1414 page(s)en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1109/TPDS.2015.2446479en
dc.identifier.issn1045-9219en
dc.identifier.issue5en
dc.identifier.urihttp://hdl.handle.net/10919/76661en
dc.identifier.volume27en
dc.language.isoenen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.titleMPI-ACC: Accelerator-Aware MPI for Scientific Applicationsen
dc.title.serialIEEE Transactions on Parallel and Distributed Systemsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/Computer Scienceen
pubs.organisational-group/Virginia Tech/Faculty of Health Sciencesen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
aji-mpi-acc-tpds15.pdf
Size:
2.44 MB
Format:
Adobe Portable Document Format
Description:
Accepted Version
License bundle
Now showing 1 - 1 of 1
Name:
VTUL_Distribution_License_2016_05_09.pdf
Size:
18.09 KB
Format:
Adobe Portable Document Format
Description: