Implementation of a Hardware-Optimized MPI Library for the SCMP Multiprocessor
As time progresses, computer architects continue to create faster and more complex microprocessors using techniques such as out-of-order execution, branch prediction, dynamic scheduling, and predication. While these techniques enable greater performance, they also increase the complexity and silicon area of the design. This creates larger development and testing times. The shrinking feature sizes associated with newer technology increase wire resistance and signal propagation delays, further complicating large designs. One potential solution is the Single-Chip Message-Passing (SCMP) Parallel Computer, developed at Virginia Tech. SCMP makes use of an architecture where a number of simple processors are tiled across a single chip and connected by a fast interconnection network. The system is designed to take advantage of thread-level parallelism and to keep wire traces short in preparation for even smaller integrated circuit feature sizes.
This thesis presents the implementation of the MPI (Message-Passing Interface) communications library on top of SCMP's hardware communication support. Emphasis is placed on the specific needs of this system with regards to MPI. For example, MPI is designed to operate between heterogeneous systems; however, in the SCMP environment such support is unnecessary and wastes resources. The SCMP network is also designed such that messages can be sent with very low latency, but with cooperative multitasking it is difficult to assure a timely response to messages. Finally, the low-level network primitives have no support for send operations that occur before the receiver is prepared and that functionality is necessary for MPI support.