Transparent spilling and refilling of partitioned overlapping register window register organizations with a remote instruction pointer
Mayhew, David Evan
MetadataShow full item record
Register allocation is critical to processor performance. Registers are the fastest storage system available to a processor. The more capable a register set's organization is at maintaining process context, the fewer the number of memory accesses the processor will need to make. Overlapping register windows have better context maintenance capabilities than single register set organizations, but overlapping register windows also show significant performance degradation if program behavior causes the register window store to overflow. Program behavior makes window overflow of simple overlapping register window organizations unavoidable. Attempts to minimize the impact of overflow by increasing the size of the register store negatively impact register access time, increases device count, and increases context switch latency. The combination of a transparent spill and refill mechanism and a small register store, allows the store to perform like a much larger store, but does not negatively impact register cycle time, and it decreases context switch latency. Transparent register spilling and refilling can be accomplished by the inclusion of a set of simple state machines, and dedicated register and memory ports. The transparent spill/refill mechanism's external port interfaces very well with established peripheral processing capabilities on many multi-processor architectures. The inclusion of an instruction repetition capability can facilitate global register storage and retrieval, and can decrease context switch latency. Register performance can be further enhanced by partitioning the register set into data typed. register groups. Register partitioning allows a high degree of parallelism, without necessitating the inclusion of register set with high port counts and register access conflicts. Partitioned register sets can the spatially proximate to processing units whose functionality is optimized for operations on specific data types. A remote instruction pointer with a partitioned code address register set and processing capability can decrease branch latency, improve call/return performance, and simplify general case return address maintenance. A partitioned, transparently spilled/refilled register organization minimizes explicit register storing and retrieving, supports the creation of large register-based working sets, and facilitates a simple parallel processing paradigm that allows a high degree sub processing unit independence.
- Doctoral Dissertations