Transparent spilling and refilling of partitioned overlapping register window register organizations with a remote instruction pointer
Abstract
Register allocation is critical to processor performance. Registers are the fastest storage system
available to a processor. The more capable a register set's organization is at maintaining process
context, the fewer the number of memory accesses the processor will need to make. Overlapping
register windows have better context maintenance capabilities than single register set organizations,
but overlapping register windows also show significant performance degradation if program
behavior causes the register window store to overflow. Program behavior makes window overflow
of simple overlapping register window organizations unavoidable. Attempts to minimize the impact
of overflow by increasing the size of the register store negatively impact register access time,
increases device count, and increases context switch latency. The combination of a transparent spill
and refill mechanism and a small register store, allows the store to perform like a much larger
store, but does not negatively impact register cycle time, and it decreases context switch latency.
Transparent register spilling and refilling can be accomplished by the inclusion of a set of simple
state machines, and dedicated register and memory ports. The transparent spill/refill mechanism's
external port interfaces very well with established peripheral processing capabilities on many
multi-processor architectures. The inclusion of an instruction repetition capability can facilitate
global register storage and retrieval, and can decrease context switch latency. Register performance
can be further enhanced by partitioning the register set into data typed. register groups. Register
partitioning allows a high degree of parallelism, without necessitating the inclusion of register set
with high port counts and register access conflicts. Partitioned register sets can the spatially
proximate to processing units whose functionality is optimized for operations on specific data
types. A remote instruction pointer with a partitioned code address register set and processing
capability can decrease branch latency, improve call/return performance, and simplify general case
return address maintenance. A partitioned, transparently spilled/refilled register organization
minimizes explicit register storing and retrieving, supports the creation of large register-based
working sets, and facilitates a simple parallel processing paradigm that allows a high degree sub
processing unit independence.
Collections
- Doctoral Dissertations [14904]