Characterization of FPGA-based High Performance Computers
dc.contributor.author | Pimenta Pereira, Karl Savio | en |
dc.contributor.committeechair | Athanas, Peter M. | en |
dc.contributor.committeemember | Schaumont, Patrick R. | en |
dc.contributor.committeemember | Feng, Wu-chun | en |
dc.contributor.department | Electrical and Computer Engineering | en |
dc.date.accessioned | 2014-03-14T20:43:18Z | en |
dc.date.adate | 2011-09-02 | en |
dc.date.available | 2014-03-14T20:43:18Z | en |
dc.date.issued | 2011-08-09 | en |
dc.date.rdate | 2011-09-02 | en |
dc.date.sdate | 2011-08-11 | en |
dc.description.abstract | As CPU clock frequencies plateau and the doubling of CPU cores per processor exacerbate the memory wall, hybrid core computing, utilizing CPUs augmented with FPGAs and/or GPUs holds the promise of addressing high-performance computing demands, particularly with respect to performance, power and productivity. While traditional approaches to benchmark high-performance computers such as SPEC, took an architecture-based approach, they do not completely express the parallelism that exists in FPGA and GPU accelerators. This thesis follows an application-centric approach, by comparing the sustained performance of two key computational idioms, with respect to performance, power and productivity. Specifically, a complex, single precision, floating-point, 1D, Fast Fourier Transform (FFT) and a Molecular Dynamics modeling application, are implemented on state-of-the-art FPGA and GPU accelerators. As results show, FPGA floating-point FFT performance is highly sensitive to a mix of dedicated FPGA resources; DSP48E slices, block RAMs, and FPGA I/O banks in particular. Estimated results show that for the floating-point FFT benchmark on FPGAs, these resources are the performance limiting factor. Fixed-point FFTs are important in a lot of high performance embedded applications. For an integer-point FFT, FPGAs exploit a flexible data path width to trade-off circuit cost and speed of computation, improving performance and resource utilization. GPUs cannot fully take advantage of this, having a fixed data-width architecture. For the molecular dynamics application, FPGAs benefit from the flexibility in creating a custom, tightly-pipelined datapath, and a highly optimized memory subsystem of the accelerator. This can provide a 250-fold improvement over an optimized CPU implementation and 2-fold improvement over an optimized GPU implementation, along with massive power savings. Finally, to extract the maximum performance out of the FPGA, each implementation requires a balance between the formulation of the algorithm on the platform, the optimum use of available external memory bandwidth, and the availability of computational resources; at the expense of a greater programming effort. | en |
dc.description.degree | Master of Science | en |
dc.identifier.other | etd-08112011-192508 | en |
dc.identifier.sourceurl | http://scholar.lib.vt.edu/theses/available/etd-08112011-192508/ | en |
dc.identifier.uri | http://hdl.handle.net/10919/34483 | en |
dc.publisher | Virginia Tech | en |
dc.relation.haspart | PimentaPereira_KS_T_2011.pdf | en |
dc.relation.haspart | PimentaPereira_KS_T_2011_fairuse.pdf | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | FFT | en |
dc.subject | molecular dynamics | en |
dc.subject | integer-point | en |
dc.subject | floating-point | en |
dc.subject | GPU | en |
dc.subject | HPC | en |
dc.subject | Field programmable gate arrays | en |
dc.title | Characterization of FPGA-based High Performance Computers | en |
dc.type | Thesis | en |
thesis.degree.discipline | Electrical and Computer Engineering | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | masters | en |
thesis.degree.name | Master of Science | en |