Accelerating Data-Serial Applications on Data-Parallel GPGPUs: A Systems Approach

Aji, Ashwin M.; Feng, Wu-chun

Accelerating Data-Serial Applications on Data-Parallel GPGPUs: A Systems Approach

Files

ipdps08.pdf (1.28 MB)

Downloads: 342

TR Number

TR-08-24

Date

2008

Authors

Aji, Ashwin M.

Feng, Wu-chun

Publisher

Department of Computer Science, Virginia Polytechnic Institute & State University

Abstract

The general-purpose graphics processing unit (GPGPU) continues to make significant strides in high-end computing by delivering unprecedented performance at a commodity price. However, the many-core architecture of the GPGPU currently allows only data-parallel applications to extract the full potential out of the hardware. Applications that require frequent synchronization during their execution do not experience much performance gain out of the GPGPU. This is mainly due to the lack of explicit hardware or software support for inter thread communication across the entire GPGPU chip. In this paper, we design, implement, and evaluate a highly-efficient software barrier that synchronizes all the thread blocks running on an ofﬂoaded kernel on the GPGPU without having to transfer execution control back to the host processor. We show that our custom software barrier achieves a three-fold performance improvement over the existing approach, i.e., synchronization via the host processor. To illustrate the aforementioned performance benefit, we parallelize a data-serial application, specifically an optimal sequence-search algorithm called Smith-Waterman (SWat), that requires frequent barrier synchronization across the many cores of the nVIDIA GeForce GTX 280 GPGPU. Our parallelization consists of a suite of optimization techniques — optimal data layout, coalesced memory accesses, and blocked data decomposition. Then, when coupled with our custom software-barrier implementation, we achieve nearly a nine-fold speed-up over the serial implementation of SWat. We also show that our solution delivers 25 faster on-chip execution than the na¨ıve implementation.

Keywords

Algorithms, Data structures

Persistent link

http://hdl.handle.net/10919/19795

Collections

Computer Science Technical Reports

Full item page

Accelerating Data-Serial Applications on Data-Parallel GPGPUs: A Systems Approach

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections