Analysis of Memory Access Patterns for Large Language Model Inference

Fisher, Max Henry

Analysis of Memory Access Patterns for Large Language Model Inference

Files

Fisher_MH_T_2025.pdf (1.78 MB)

Downloads: 907

Date

2025-07-09

Authors

Fisher, Max Henry

Publisher

Virginia Tech

Abstract

The use of tiered heterogeneous memory systems in HPC workloads is growing in popularity as the increasing memory requirements for these workloads outpace the decline in the cost- per-gigabyte of fast DRAM; however, the Linux kernel has no intelligent strategy to manage these tiered memory systems. Because of this limitation, a great deal of research has been conducted to identify policies that make efficient use of these systems. Much of this prior research focuses on deep learning tasks, while only a few focus on inference for large models. The training and inference workloads for the same type of model are quite different: in training, the task is to continuously update the weights matrices with knowledge gained from each training datum, while in inference, the workload only reads from the weights. Training for neural networks also involves accesses in reverse order to what is used in inference, in a training technique called backpropagation. This thesis presents a memory access pattern heatmap tool that can track evolving access patterns through the lifetime of a workload. This tool is applied to llama.cpp, an LLM inference tool, to identify memory access patterns between remote and local NUMA nodes. The thesis then explores two basic NUMA page placement strategies, where all memory is bound to either the local or remote NUMA nodes to identify the impact of poor NUMA policies on performance and compares them to the default Linux strategy.

Keywords

NUMA, Page Placement, Memory Access Patterns, High Performance Computing, LLM Inference

Persistent link

https://hdl.handle.net/10919/135946

Collections

Masters Theses

Full item page

Analysis of Memory Access Patterns for Large Language Model Inference

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections