Browsing by Author "Sathre, Paul"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- A Composable Workflow for Productive FPGA Computing via Whole-Program Analysis and Transformation (with Code Excerpts)Sathre, Paul; Helal, Ahmed E.; Feng, Wu-chun (Department of Computer Science, Virginia Polytechnic Institute & State University, 2018-07-24)We present a composable workflow to enable highly-productive heterogeneous computing on FPGAs. The workflow consists of a trio of static analysis and transformation tools: (1) a whole-program, source-to-source translator to transform existing parallel code to OpenCL, (2) a set of OpenCL kernel linters, which target FPGAs to detect possible semantic errors and performance traps, and (3) a whole-program OpenCL linter to validate the host-to-device interface of OpenCL programs. The workflow promotes rapid realization of heterogeneous parallel code across a multitude of heterogeneous computing environments, particularly FPGAs, by providing complementary tools for automatic CUDA-to-OpenCL translation and compile-time OpenCL validation in advance of very expensive compilation, placement, and routing on FPGAs. The proposed tools perform whole-program analysis and transformation to tackle realworld, large-scale parallel applications. The efficacy of the workflow tools is demonstrated via a representative translation and analysis of a sizable CUDA finite automata processing engine as well as the analysis and validation of an additional 96 OpenCL benchmarks.
- Edge-Connected Jaccard Similarity for Graph Link Prediction on FPGASathre, Paul; Gondhalekar, Atharva; Feng, Wu-chun (IEEE, 2022-01-01)Graph analysis is a critical task in many fields, such as social networking, epidemiology, bioinformatics, and fraud de-tection. In particular, understanding and inferring relationships between graph elements lies at the core of many graph-based workloads. Real-world graph workloads and their associated data structures create irregular computational patterns that compli-cate the realization of high-performance kernels. Given these complications, there does not exist a de facto 'best' architecture, language, or algorithmic approach that simultaneously balances performance, energy efficiency, portability, and productivity. In this paper, we realize different algorithms of edge-connected Jaccard similarity for graph link prediction and characterize their performance across a broad spectrum of graphs on an Intel Stratix 10 FPGA. By utilizing a high-level synthesis (HLS)-driven, high-productivity approach (via the C++-based SYCL language) we rapidly prototype two implementations - a from-scratch edge-centric version and a faithfully-ported commodity GPU implementation - which would have been intractable via a hardware description language. With these implementations, we further consider the benefit and necessity of four HLS-enabled optimizations, both in isolation and in concert - totaling seven distinct synthesized hardware pipelines. Leveraging real-world graphs of up to 516 million edges, we show empirically-measured speedups of up to 9.5 x over the initial HLS implementations when all optimizations work in concert.
- MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-Core ClustersHelal, A.; Sathre, Paul; Feng, Wu-chun (2016-11-15)To attain scalable performance efficiently, the HPC community expects future exascale systems to consist of multiple nodes, each with different types of hardware accelerators. In addition to GPUs and Intel MICs, additional candidate accelerators include embedded multiprocessors and FPGAs. End users need appropriate tools to efficiently use the available compute resources in such systems, both within a compute node and across compute nodes. As such, we present MetaMorph, a library framework designed to (automatically) extract as much computational capability as possible from HPC systems. Its design centers around three core principles: abstraction, interoperability, and adaptivity. To demonstrate its efficacy, we present a case study that uses the structured grids design pattern, which is heavily used in computational fluid dynamics. We show how MetaMorph significantly reduces the development time, while delivering performance and interoperability across an array of heterogeneous devices, including multicore CPUs, Intel MICs, AMD GPUs, and NVIDIA GPUs.
- Virginia Tech Student PresentationsEndert, Alex; Sathre, Paul (2013-05-06)Sathre: In today's society we don't realize that parallel computing is everywhere. Parallel computing is no longer something that is just in supercomputers. Multiple devices such as laptops, tablets, and even cell phones take advantage of parallel computing nowadays. In this presentation Sathre gives an overview of how an accelerator(GPU) is used in comparison to CPU. He also talks about translation from CUDA to OpenCL. Endert's Website: http://people.cs.vt.edu/aendert/Alex_Endert/Home.html