Virginia Tech
    • Log in
    View Item 
    •   VTechWorks Home
    • College of Engineering (COE)
    • Department of Computer Science
    • Computer Science Technical Reports
    • View Item
    •   VTechWorks Home
    • College of Engineering (COE)
    • Department of Computer Science
    • Computer Science Technical Reports
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Bridging the Performance-Programmability Gap for FPGAs via OpenCL: A Case Study with OpenDwarfs

    Thumbnail
    View/Open
    fccm16-opendwarfs.pdf (1.022Mb)
    Downloads: 1161
    TR number
    TR-16-03
    Date
    2016-05-13
    Author
    Krommydas, Konstantinos
    Helal, Ahmed E.
    Verma, Anshuman
    Feng, Wu-chun
    Metadata
    Show full item record
    Abstract
    For decades, the streaming architecture of FPGAs has delivered accelerated performance across many application domains, such as option pricing solvers in finance, computational fluid dynamics in oil and gas, and packet processing in network routers and firewalls. However, this performance has come at the significant expense of programmability, i.e., the performance-programmability gap. In particular, FPGA developers use hardware design languages (HDLs) to implement the application data path and to design hardware modules for computation pipelines, memory management, synchronization, and communication. This process requires extensive low-level knowledge of the target FPGA architecture and consumes significant development time and effort. To address this lack of programmability of FPGAs, OpenCL provides an easy-to-use and portable programming model for CPUs, GPUs, APUs, and now, FPGAs. However, this significantly improved programmability can come at the expense of performance; that is, there still remains a performance-programmability gap. To improve the performance of OpenCL kernels on FPGAs, and thus, bridge the performance-programmability gap, we identify general techniques to optimize OpenCL kernels for FPGAs under device-specific hardware constraints. We then apply these optimization techniques to the OpenDwarfs benchmark suite, with its diverse parallelism profiles and memory access patterns, in order to evaluate the effectiveness of the optimizations in terms of performance and resource utilization. Finally, we present the performance of the optimized OpenDwarfs, along with their potential re-factoring, to bridge the performance gap from programming in OpenCL versus programming in a HDL. Index Terms—OpenDwarfs; FPGA; OpenCL; GPU; GPGPU; MIC; Accelerators; Performance Portability
    URI
    http://hdl.handle.net/10919/70968
    Collections
    • Computer Science Technical Reports [1036]

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us
     

     

    VTechWorks

    AboutPoliciesHelp

    Browse

    All of VTechWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Log inRegister

    Statistics

    View Usage Statistics

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us