Wormhole Run-Time Reconfiguration: Conceptualization and VLSI Design of a High Performance Computing System
In the past, various approaches to the high performance numerical computing problem have been explored. Recently, researchers have begun to explore the possibilities of using Field Programmable Gate Arrays (FPGAs) to solve numerically intensive problems. FPGAs offer the possibility of customization to any given application, while not sacrificing applicability to a wide problem domain. Further, the implementation of data flow graphs directly in silicon makes FPGAs very attractive for these types of problems. Unfortunately, current FPGAs suffer from a number of inadequacies with respect to the task. They have lower transistor densities than ASIC solutions, and hence less potential computational power per unit area. Routing overhead generally makes an FPGA solution slower than an ASIC design.
Bit-oriented computational units make them unnecessarily inefficient for implementing tasks that are generally word-oriented. And finally, in large volumes, FPGAs tend to be more expensive per unit due to their lower transistor density. To combat these problems, researchers are now exploiting the unique advantage that FPGAs exhibit over ASICs: reconfigurability. By customizing the FPGA to the task at hand, as the application executes, it is hoped that the cost-performance product of an FPGA system can be shown to be a better solution than a system implemented by a collection of custom ASICs. Such a system is called a Configurable Computing Machine (CCM). Many aspects of the design of the FPGAs available today hinder the exploration of this field.
This thesis addresses many of these problems and presents the embodiment of those solutions in the Colt CCM. By offering word grain reconfiguration and the ability to partially reconfigure at computational element resolution, the Colt can offer higher effective utilization over traditional FPGAs. Further, the majority of the pins of the Colt can be used for both normal I/O and for chip reconfiguration. This provides higher reconfiguration bandwidth contrasted with the low percentage of pins used for reconfiguration of FPGAs. Finally, Colt uses a distributed reconfiguration mechanism called Wormhole Run-Time Reconfiguration (RTR) that allows multiple data ports to simultaneously program different sections of the chip independently. Used as the primary example of Wormhole RTR in the patent application, Colt is the first system to employ this computing paradigm.