Programming High-Performance Clusters with Heterogeneous Computing Devices

dc.contributor.authorAji, Ashwin M.en
dc.contributor.committeechairFeng, Wu-chunen
dc.contributor.committeememberRibbens, Calvin J.en
dc.contributor.committeememberBisset, Keith R.en
dc.contributor.committeememberMarathe, Madhav Vishnuen
dc.contributor.committeememberBalaji, Pavanen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2015-05-20T08:00:08Zen
dc.date.available2015-05-20T08:00:08Zen
dc.date.issued2015-05-19en
dc.description.abstractToday's high-performance computing (HPC) clusters are seeing an increase in the adoption of accelerators like GPUs, FPGAs and co-processors, leading to heterogeneity in the computation and memory subsystems. To program such systems, application developers typically employ a hybrid programming model of MPI across the compute nodes in the cluster and an accelerator-specific library (e.g.; CUDA, OpenCL, OpenMP, OpenACC) across the accelerator devices within each compute node. Such explicit management of disjointed computation and memory resources leads to reduced productivity and performance. This dissertation focuses on designing, implementing and evaluating a runtime system for HPC clusters with heterogeneous computing devices. This work also explores extending existing programming models to make use of our runtime system for easier code modernization of existing applications. Specifically, we present MPI-ACC, an extension to the popular MPI programming model and runtime system for efficient data movement and automatic task mapping across the CPUs and accelerators within a cluster, and discuss the lessons learned. MPI-ACC's task-mapping runtime subsystem performs fast and automatic device selection for a given task. MPI-ACC's data-movement subsystem includes careful optimizations for end-to-end communication among CPUs and accelerators, which are seamlessly leveraged by the application developers. MPI-ACC provides a familiar, flexible and natural interface for programmers to choose the right computation or communication targets, while its runtime system achieves efficient cluster utilization.en
dc.description.degreePh. D.en
dc.format.mediumETDen
dc.identifier.othervt_gsexam:4292en
dc.identifier.urihttp://hdl.handle.net/10919/52366en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectRuntime Systemsen
dc.subjectProgramming Modelsen
dc.subjectGeneral Purpose Graphics Processing Units (GPGPUs)en
dc.subjectMessage Passing Interface (MPI)en
dc.subjectCUDAen
dc.subjectOpenCLen
dc.titleProgramming High-Performance Clusters with Heterogeneous Computing Devicesen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Aji_AM_D_2015.pdf
Size:
9.92 MB
Format:
Adobe Portable Document Format