Show simple item record

dc.contributor.authorHou, Kaixien_US
dc.date.accessioned2018-08-28T08:00:40Z
dc.date.available2018-08-28T08:00:40Z
dc.date.issued2018-08-27
dc.identifier.othervt_gsexam:15187en_US
dc.identifier.urihttp://hdl.handle.net/10919/84923
dc.description.abstractNowadays, parallel accelerators have become prominent and ubiquitous, e.g., multi-core CPUs, many-core GPUs (Graphics Processing Units) and Intel Xeon Phi. The performance gains from them can be as high as many orders of magnitude, attracting extensive interest from many scientific domains. However, the gains are closely followed by two main problems: (1) A complete redesign of existing codes might be required if a new parallel platform is used, leading to a nightmare for developers. (2) Parallel codes that execute efficiently on one platform might be either inefficient or even non-executable for another platform, causing portability issues. To handle these problems, in this dissertation, we propose a general approach using parallel patterns, an effective and abstracted layer to ease the generating efficient parallel codes for given algorithms and across architectures. From algorithms to parallel patterns, we exploit the domain expertise to analyze the computational and communication patterns in the core computations and represent them in DSL (Domain Specific Language) or algorithmic skeletons. This preserves the essential information, such as data dependencies, types, etc., for subsequent parallelization and optimization. From parallel patterns to actual codes, we use a series of automation frameworks and transformations to determine which levels of parallelism can be used, what optimal instruction sequences are, how the implementation change to match different architectures, etc. Experiments show that our approaches by investigating a couple of important computational kernels, including sort (and segmented sort), sequence alignment, stencils, etc., across various parallel platforms (CPUs, GPUs, Intel Xeon Phi).en_US
dc.format.mediumETDen_US
dc.publisherVirginia Techen_US
dc.rightsThis item is protected by copyright and/or related rights. Some uses of this item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s).en_US
dc.subjectGPUen_US
dc.subjectAVXen_US
dc.subjectsorten_US
dc.subjectstencilen_US
dc.subjectwavefronten_US
dc.subjectpatternen_US
dc.subjectparallelismen_US
dc.titleExploring Performance Portability for Accelerators via High-level Parallel Patternsen_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Science and Applicationsen_US
dc.contributor.committeechairFeng, Wu-Chunen_US
dc.contributor.committeememberCao, Yongen_US
dc.contributor.committeememberAgrawal, Gaganen_US
dc.contributor.committeememberRibbens, Calvin J.en_US
dc.contributor.committeememberWang, Haoen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record