Implementing Scientific Simulation Codes Tailored for Vector Architectures Using Custom Configurable Computing Machines

TR Number
Date
2010-12-16
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract

Prior to the availability of massively parallel supercomputers, the implementation of choice for scientific computing problems such as large numerical physical simulations was typically a vector supercomputer. Legacy code still exists optimized for vector supercomputers. Rehosting legacy code often requires a complete re-write of the original code, which is a long and expensive effort. This work provides a framework and approach to utilize reconfigurable computing resources in place of a vector supercomputer towards the implementation of a legacy source code without a large re-hosting effort. The choice of a vector processing model constrains the solution space such that practical solutions to the underlying resource constrained scheduling problem are achieved. Reconfigurable computing resources that implement capabilities characteristic of the application's original target platform are examined. The framework includes the following components: (1) a template for a parameterized, configurable vector processing core, (2) a scheduling and allocation algorithm that employs lessons learned from the mature knowledge base of vector supercomputing, and (3) the design of the VectCore co-processor to provide a low-overhead interface and control method for instances of the architectural template. The implementation approach applies the framework to produce VectCore instances tailored for specific input problems that meet resource constraints. Experimental data shows the VectCore approach results in efficient implementations with favorable performance compared to both general purpose processing and fixed vector architecture alternatives for the majority of the benchmark cases. Half the benchmark cases scale nearly linearly under a fixed time scaling model. The fixed workload scaling is also linear for the same cases until becoming constant for a subset of the benchmarks due to resource contention in the VectCore implementation limiting the maximum achievable parallelism. The architectural template contributed by this work supports established vector performance enhancing techniques such as parallel and chained operations. As the hardware resources are scaled, the VectCore approach scales the amount of parallelism applied in a problem implementation. In end-to-end hardware experiments, the VectCore co-processor overhead is shown to be small (less than 4%) compared to the schedule length.

Description
Keywords
Scientific Computing, Vector Computing, Reconfigurable Computing, Field-Programmable Gate Arrays
Citation