Show simple item record

dc.contributor.authorAji, Ashwin M.en_US
dc.date.accessioned2015-05-20T08:00:08Z
dc.date.available2015-05-20T08:00:08Z
dc.date.issued2015-05-19en_US
dc.identifier.othervt_gsexam:4292en_US
dc.identifier.urihttp://hdl.handle.net/10919/52366
dc.description.abstractToday's high-performance computing (HPC) clusters are seeing an increase in the adoption of accelerators like GPUs, FPGAs and co-processors, leading to heterogeneity in the computation and memory subsystems. To program such systems, application developers typically employ a hybrid programming model of MPI across the compute nodes in the cluster and an accelerator-specific library (e.g.; CUDA, OpenCL, OpenMP, OpenACC) across the accelerator devices within each compute node. Such explicit management of disjointed computation and memory resources leads to reduced productivity and performance. This dissertation focuses on designing, implementing and evaluating a runtime system for HPC clusters with heterogeneous computing devices. This work also explores extending existing programming models to make use of our runtime system for easier code modernization of existing applications. Specifically, we present MPI-ACC, an extension to the popular MPI programming model and runtime system for efficient data movement and automatic task mapping across the CPUs and accelerators within a cluster, and discuss the lessons learned. MPI-ACC's task-mapping runtime subsystem performs fast and automatic device selection for a given task. MPI-ACC's data-movement subsystem includes careful optimizations for end-to-end communication among CPUs and accelerators, which are seamlessly leveraged by the application developers. MPI-ACC provides a familiar, flexible and natural interface for programmers to choose the right computation or communication targets, while its runtime system achieves efficient cluster utilization.en_US
dc.format.mediumETDen_US
dc.publisherVirginia Techen_US
dc.rightsThis Item is protected by copyright and/or related rights. Some uses of this Item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s).en_US
dc.subjectRuntime Systemsen_US
dc.subjectProgramming Modelsen_US
dc.subjectGeneral Purpose Graphics Processing Units (GPGPUs)en_US
dc.subjectMessage Passing Interface (MPI)en_US
dc.subjectCUDAen_US
dc.subjectOpenCLen_US
dc.titleProgramming High-Performance Clusters with Heterogeneous Computing Devicesen_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Science and Applicationsen_US
dc.contributor.committeechairFeng, Wu-Chunen_US
dc.contributor.committeememberRibbens, Calvin J.en_US
dc.contributor.committeememberBisset, Keith R.en_US
dc.contributor.committeememberMarathe, Madhav Vishnuen_US
dc.contributor.committeememberBalaji, Pavanen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record