NUMA Data-Access Bandwidth Characterization and Modeling

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

Clusters of seemingly homogeneous compute nodes are increasingly heterogeneous within each node due to replication and distribution of node-level subsystems. This intra-node heterogeneity can adversely affect program execution performance by inflicting additional data-access performance penalties when accessing non-local data. In many modern NUMA architectures, both memory and I/O controllers are distributed within a node and CPU cores are logically divided into “local” and “remote” data-accesses within the system. In this thesis a method for analyzing main memory and PCIe data-access characteristics of modern AMD and Intel NUMA architectures is presented. Also presented here is the synthesis of data-access performance models designed to quantify the effects of these architectural characteristics on data-access bandwidth. Such performance models provide an analytical tool for determining the performance impact of remote data-accesses for a program or access pattern running in a given system. Data-access performance models also provide a means for comparing the data-access bandwidth and attributes of NUMA architectures, for improving application performance when running on these architectures, and for improving process/thread mapping onto CPU cores in these architectures. Preliminary examples of how programs respond to these data-access bandwidth characteristics are also presented as motivation for future work.

Performance Modeling, NUMA, Benchmarking