Exploring Performance Portability for Accelerators via High-level Parallel Patterns

dc.contributor.authorHou, Kaixien
dc.contributor.committeechairFeng, Wu-chunen
dc.contributor.committeememberCao, Yongen
dc.contributor.committeememberAgrawal, Gaganen
dc.contributor.committeememberRibbens, Calvin J.en
dc.contributor.committeememberWang, Haoen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2018-08-28T08:00:40Zen
dc.date.available2018-08-28T08:00:40Zen
dc.date.issued2018-08-27en
dc.description.abstractNowadays, parallel accelerators have become prominent and ubiquitous, e.g., multi-core CPUs, many-core GPUs (Graphics Processing Units) and Intel Xeon Phi. The performance gains from them can be as high as many orders of magnitude, attracting extensive interest from many scientific domains. However, the gains are closely followed by two main problems: (1) A complete redesign of existing codes might be required if a new parallel platform is used, leading to a nightmare for developers. (2) Parallel codes that execute efficiently on one platform might be either inefficient or even non-executable for another platform, causing portability issues. To handle these problems, in this dissertation, we propose a general approach using parallel patterns, an effective and abstracted layer to ease the generating efficient parallel codes for given algorithms and across architectures. From algorithms to parallel patterns, we exploit the domain expertise to analyze the computational and communication patterns in the core computations and represent them in DSL (Domain Specific Language) or algorithmic skeletons. This preserves the essential information, such as data dependencies, types, etc., for subsequent parallelization and optimization. From parallel patterns to actual codes, we use a series of automation frameworks and transformations to determine which levels of parallelism can be used, what optimal instruction sequences are, how the implementation change to match different architectures, etc. Experiments show that our approaches by investigating a couple of important computational kernels, including sort (and segmented sort), sequence alignment, stencils, etc., across various parallel platforms (CPUs, GPUs, Intel Xeon Phi).en
dc.description.abstractgeneralNowadays, parallel accelerators have become prominent and ubiquitous, e.g., multi-core CPUs, many-core GPUs (Graphics Processing Units) and Intel Xeon Phi. The performance gains from them can be as high as many orders of magnitude, attracting extensive interest from many scientific domains. However, the gains are closely followed by two main problems: (1) A complete redesign of existing codes might be required if a new parallel platform is used, leading to a nightmare for developers. (2) Parallel codes that execute efficiently on one platform might be either inefficient or even non-executable for another platform, causing portability issues. To handle these problems, in this dissertation, we propose a general approach using parallel patterns, an effective and abstracted layer to ease the generating efficient parallel codes for given algorithms and across architectures. From algorithms to parallel patterns, we exploit the domain expertise to analyze the computational and communication patterns in the core computations and represent them in DSL (Domain Specific Language) or algorithmic skeletons. This preserves the essential information, such as data dependencies, types, etc., for subsequent parallelization and optimization. From parallel patterns to actual codes, we use a series of automation frameworks and transformations to determine which levels of parallelism can be used, what optimal instruction sequences are, how the implementation change to match different architectures, etc. Experiments show that our approaches by investigating a couple of important computational kernels, including sort (and segmented sort), sequence alignment, stencils, etc., across various parallel platforms (CPUs, GPUs, Intel Xeon Phi).en
dc.description.degreePh. D.en
dc.format.mediumETDen
dc.identifier.othervt_gsexam:15187en
dc.identifier.urihttp://hdl.handle.net/10919/84923en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectGPUen
dc.subjectAVXen
dc.subjectsorten
dc.subjectstencilen
dc.subjectwavefronten
dc.subjectpatternen
dc.subjectparallelismen
dc.titleExploring Performance Portability for Accelerators via High-level Parallel Patternsen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hou_K_D_2018.pdf
Size:
3.72 MB
Format:
Adobe Portable Document Format