Cross-stack Improvement on Memory Efficiency

TR Number

Date

2026-05-26

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Modern data centers increasingly host memory-intensive applications that demand vast memory capacity. However, memory performance has not scaled at the same pace as compute performance, and indiscriminately expanding memory resources incurs substantial infrastructure and maintenance costs, ultimately limiting scalability and efficiency. By co-designing solutions across the programming-language runtime, compiler, operating system, and hardware, this dissertation systematically improves both the space efficiency and performance of memory systems in modern data centers. The first contribution, MTP, integrates fine-grained object hotness tracking into the CPython virtual machine to enable per-application memory tiering for Python workloads, allowing applications to run on a small fast tier backed by larger, cheaper CXL memory and lowering the effective per-gigabyte cost of memory. MTP infers object access patterns from reference-count deltas and employs an eagerness-aware migration policy that adapts to runtime memory behavior, outperforming TPP and AutoNUMA on the majority of 33 evaluated configurations and matching MEMTIS on a substantial fraction. The second contribution, CPM, restores secure memory sharing in cloud environments by decoupling cache sharing from memory sharing. CPM places all hardware defense logic within the memory controller, using a table-free merging mechanism that maps groups of unique physical page numbers to shared physical pages via a mathematical formula. The first hardware prototype of reuse cache side-channel defense, built on an Intel server with an FPGA-based CXL memory controller, eliminates Flush+Reload attacks across containers and VMs while improving performance by 2.2% on average and increasing VM consolidation density by 2.6x on 16GB of CXL memory over the current practice of disabling sharing. The third contribution, SABER, addresses the memory-level parallelism (MLP) destroyed by branch mispredictions. Hardware predictors fail on data-dependent branches, and existing software if-conversion operates too late in the compilation pipeline---in the target-dependent backend, where program semantics needed to reason about memory safety have been lost. SABER lifts this analysis to the target-independent middle-end IR, synthesizing software memory predication that reaches branches no prior pass could touch, and governs its application through a three-tier cost framework: a static model reasoning from IR structure, an offline profile-guided optimization (PGO) tier empirically calibrated against measured memory behavior, and an online tier adapting to input-dependent branch predictability at runtime. SABER delivers 13.6% geomean speedup on microbenchmarks and 4.0% on SPEC CPU 2017 with zero regressions, while transforming fewer than 10% of structurally eligible branches.

Description

Keywords

Memory System, Tiered Memory, Programming Language Virtual Machine, Compiler, Branch Elimination, CPU Pipeline, Operating System, Side-channel Attack

Citation