Accelerating Workloads on FPGAs via OpenCL: A Case Study with OpenDwarfs

Verma, Anshuman; Helal, Ahmed E.; Krommydas, Konstantinos; Feng, Wu-chun

Accelerating Workloads on FPGAs via OpenCL: A Case Study with OpenDwarfs

dc.contributor.author	Verma, Anshuman	en
dc.contributor.author	Helal, Ahmed E.	en
dc.contributor.author	Krommydas, Konstantinos	en
dc.contributor.author	Feng, Wu-chun	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2016-05-13T20:41:57Z	en
dc.date.available	2016-05-13T20:41:57Z	en
dc.date.issued	2016-05-13	en
dc.description.abstract	For decades, the streaming architecture of FPGAs has delivered accelerated performance across many application domains, such as option pricing solvers in finance, computational fluid dynamics in oil and gas, and packet processing in network routers and firewalls. However, this performance comes at the expense of programmability. FPGA developers use hardware design languages (HDLs) to implement the application data and control path and to design hardware modules for computational pipelines, memory management, synchronization, and communication. This process requires extensive knowledge of logic design, design automation tools, and low-level details of FPGA architecture, this consumes significant development time and effort. To address this lack of programmability of FPGAs, OpenCL provides an easy-to-use and portable programming model for CPUs, GPUs, APUs, and now, FPGAs. Although this significantly improved programmability yet an optimized GPU implementation of kernel may lack performance portability for FPGA. To improve the performance of OpenCL kernels on FPGAs we identify general techniques to optimize OpenCL kernels for FPGAs under device-specific hardware constraints. We then apply these optimizations techniques to the OpenDwarfs benchmark suite, which has diverse parallelism profiles and memory access patterns, in order to evaluate the effectiveness of the optimizations in terms of performance and resource utilization. Finally, we present the performance of structured grids and N-body dwarf-based benchmarks in the context of various optimization along with their potential re-factoring. We find that careful design of kernels for FPGA can result in a highly efficient pipeline achieving 91% of theoretical throughput for the structured grids dwarf. Index Terms—OpenDwarfs; FPGA; OpenCL; GPU; MIC; Accelerators; Performance Portability	en
dc.format.mimetype	application/pdf	en
dc.identifier.trnumber	TR-16-04	en
dc.identifier.uri	http://hdl.handle.net/10919/70969	en
dc.language.iso	en	en
dc.publisher	Department of Computer Science, Virginia Polytechnic Institute & State University	en
dc.relation.ispartof	Computer Science Technical Reports	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Architecture	en
dc.subject	Computer systems	en
dc.subject	High performance computing	en
dc.subject	Parallel and distributed computing	en
dc.title	Accelerating Workloads on FPGAs via OpenCL: A Case Study with OpenDwarfs	en
dc.type	Technical report	en
dc.type.dcmitype	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: fpl16-opendwarfs.pdf
Size:: 688.23 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Computer Science Technical Reports