Exploring Performance Portability for Accelerators via High-level Parallel Patterns

Hou, Kaixi

Exploring Performance Portability for Accelerators via High-level Parallel Patterns

Files

Hou_K_D_2018.pdf (3.72 MB)

Downloads: 2448

Date

2018-08-27

Authors

Hou, Kaixi

Publisher

Virginia Tech

Abstract

Nowadays, parallel accelerators have become prominent and ubiquitous, e.g., multi-core CPUs, many-core GPUs (Graphics Processing Units) and Intel Xeon Phi. The performance gains from them can be as high as many orders of magnitude, attracting extensive interest from many scientific domains. However, the gains are closely followed by two main problems: (1) A complete redesign of existing codes might be required if a new parallel platform is used, leading to a nightmare for developers. (2) Parallel codes that execute efficiently on one platform might be either inefficient or even non-executable for another platform, causing portability issues.

To handle these problems, in this dissertation, we propose a general approach using parallel patterns, an effective and abstracted layer to ease the generating efficient parallel codes for given algorithms and across architectures. From algorithms to parallel patterns, we exploit the domain expertise to analyze the computational and communication patterns in the core computations and represent them in DSL (Domain Specific Language) or algorithmic skeletons. This preserves the essential information, such as data dependencies, types, etc., for subsequent parallelization and optimization. From parallel patterns to actual codes, we use a series of automation frameworks and transformations to determine which levels of parallelism can be used, what optimal instruction sequences are, how the implementation change to match different architectures, etc. Experiments show that our approaches by investigating a couple of important computational kernels, including sort (and segmented sort), sequence alignment, stencils, etc., across various parallel platforms (CPUs, GPUs, Intel Xeon Phi).

Keywords

GPU, AVX, sort, stencil, wavefront, pattern, parallelism

Persistent link

http://hdl.handle.net/10919/84923

Collections

Doctoral Dissertations

Full item page

Exploring Performance Portability for Accelerators via High-level Parallel Patterns

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections