Design and prototyping of Hardware-Accelerated Locality-aware Memory Compression
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Hardware Acceleration is the most sought technique in chip design to achieve better performance and power efficiency for critical functions that may be in-efficiently handled from traditional OS/software. As technology started advancing with 7nm products already in the market which can provide better power and performance consuming low area, the latency-critical functions that were handled by software traditionally now started moving as acceleration units in the chip. This thesis describes the accelerator architecture, implementation, and prototype for one of such functions namely "Locality-Aware memory compression" which is part of the "OS-controlled memory compression" scheme that has been actively deployed in today's OSes. In brief, OS-controlled memory compression is a new memory management feature that transparently, dramatically, and adaptively increases effective main memory capacity on-demand as software-level memory usage increases beyond physical memory system capacity. OS-controlled memory compression has been adopted across almost all OSes (e.g., Linux, Windows, macOS, AIX) and almost all classes of computing systems (e.g., smartphones, PCs, data centers, and cloud). The OS-controlled memory compression scheme is Locality Aware. But still under OS-controlled memory compression today, applications experience long-latency page faults when accessing compressed memory. To solve this per- performance bottle-neck, acceleration technique has been proposed to manage "Locality Aware Memory compression" within hardware thereby enabling applications to access their OS- compressed memory directly. This Accelerator is referred to as HALK throughout this work, which stands for "Hardware-accelerated Locality-aware Memory Compression". The literal mean- ing of the word HALK in English is 'a hidden place'. As such, this accelerator is neither exposed to the OS nor to the running applications. It is hidden entirely in the memory con- troller hardware and incurs minimal hardware cost. This thesis work explores developing FPGA design prototype and gives the proof of concept for the functionality of HALK by running non-trivial micro-benchmarks. This work also provides and analyses power, performance, and area of HALK for ASIC designs (at technology node of 7nm) and selected FPGA Prototype design.