Show simple item record

dc.contributor.authorVerma, Anshumanen_US
dc.contributor.authorHelal, Ahmed E.en_US
dc.contributor.authorKrommydas, Konstantinosen_US
dc.contributor.authorFeng, Wu-chunen_US
dc.date.accessioned2016-05-13T20:41:57Z
dc.date.available2016-05-13T20:41:57Z
dc.date.issued2016-05-13
dc.identifier.urihttp://hdl.handle.net/10919/70969
dc.description.abstractFor decades, the streaming architecture of FPGAs has delivered accelerated performance across many application domains, such as option pricing solvers in finance, computational fluid dynamics in oil and gas, and packet processing in network routers and firewalls. However, this performance comes at the expense of programmability. FPGA developers use hardware design languages (HDLs) to implement the application data and control path and to design hardware modules for computational pipelines, memory management, synchronization, and communication. This process requires extensive knowledge of logic design, design automation tools, and low-level details of FPGA architecture, this consumes significant development time and effort. To address this lack of programmability of FPGAs, OpenCL provides an easy-to-use and portable programming model for CPUs, GPUs, APUs, and now, FPGAs. Although this significantly improved programmability yet an optimized GPU implementation of kernel may lack performance portability for FPGA. To improve the performance of OpenCL kernels on FPGAs we identify general techniques to optimize OpenCL kernels for FPGAs under device-specific hardware constraints. We then apply these optimizations techniques to the OpenDwarfs benchmark suite, which has diverse parallelism profiles and memory access patterns, in order to evaluate the effectiveness of the optimizations in terms of performance and resource utilization. Finally, we present the performance of structured grids and N-body dwarf-based benchmarks in the context of various optimization along with their potential re-factoring. We find that careful design of kernels for FPGA can result in a highly efficient pipeline achieving 91% of theoretical throughput for the structured grids dwarf. Index Terms—OpenDwarfs; FPGA; OpenCL; GPU; MIC; Accelerators; Performance Portabilityen_US
dc.format.mimetypeapplication/pdfen_US
dc.language.isoen_USen_US
dc.publisherDepartment of Computer Science, Virginia Polytechnic Institute & State Universityen_US
dc.relation.ispartofComputer Science Technical Reportsen_US
dc.subjectArchitectureen_US
dc.subjectComputer systemsen_US
dc.subjectHigh performance computingen_US
dc.subjectParallel and distributed computingen_US
dc.titleAccelerating Workloads on FPGAs via OpenCL: A Case Study with OpenDwarfsen_US
dc.typeTechnical reporten_US
dc.identifier.trnumberTR-16-04en_US
dc.type.dcmitypeTexten_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record