Analysis of Memory Access Patterns for Large Language Model Inference

Fisher, Max Henry

Analysis of Memory Access Patterns for Large Language Model Inference

dc.contributor.author	Fisher, Max Henry	en
dc.contributor.committeechair	Nikolopoulos, Dimitrios S.	en
dc.contributor.committeemember	Back, Godmar Volker	en
dc.contributor.committeemember	Li, Huaicheng	en
dc.contributor.department	Computer Science and#38; Applications	en
dc.date.accessioned	2025-07-10T08:00:19Z	en
dc.date.available	2025-07-10T08:00:19Z	en
dc.date.issued	2025-07-09	en
dc.description.abstract	The use of tiered heterogeneous memory systems in HPC workloads is growing in popularity as the increasing memory requirements for these workloads outpace the decline in the cost- per-gigabyte of fast DRAM; however, the Linux kernel has no intelligent strategy to manage these tiered memory systems. Because of this limitation, a great deal of research has been conducted to identify policies that make efficient use of these systems. Much of this prior research focuses on deep learning tasks, while only a few focus on inference for large models. The training and inference workloads for the same type of model are quite different: in training, the task is to continuously update the weights matrices with knowledge gained from each training datum, while in inference, the workload only reads from the weights. Training for neural networks also involves accesses in reverse order to what is used in inference, in a training technique called backpropagation. This thesis presents a memory access pattern heatmap tool that can track evolving access patterns through the lifetime of a workload. This tool is applied to llama.cpp, an LLM inference tool, to identify memory access patterns between remote and local NUMA nodes. The thesis then explores two basic NUMA page placement strategies, where all memory is bound to either the local or remote NUMA nodes to identify the impact of poor NUMA policies on performance and compares them to the default Linux strategy.	en
dc.description.abstractgeneral	Scientific computing often involves running programs with very large memory footprints that might not fit in the available memory for the system on which they run. Because expanding the available memory on a system can be expensive, tiered memory systems, which provide the illusion of a fast and large memory by storing some data in regular small, expensive, and fast memory (DRAM) and the rest of the data in large, cheap, and slow memory (NVM), have been growing in popularity. However, effective use of such systems requires an intelligent strategy for moving data between tiers of memory. Past research has explored strategies that leverage data access patterns in programs like those that train machine learning models, but few explore leveraging the access patterns of running machine learning models. This thesis explores how llama.cpp, a program that runs large language models, makes accesses to data and how those patterns change as the program runs. It also explores how differing data placement policies impact performance. It compares how quickly llama.cpp runs when all data is allocated to a slow memory node to when all data is allocated to a fast memory node, reflecting how important these strategies are to fast performance.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:43726	en
dc.identifier.uri	https://hdl.handle.net/10919/135946	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	Creative Commons Attribution-ShareAlike 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by-sa/4.0/	en
dc.subject	NUMA	en
dc.subject	Page Placement	en
dc.subject	Memory Access Patterns	en
dc.subject	High Performance Computing	en
dc.subject	LLM Inference	en
dc.title	Analysis of Memory Access Patterns for Large Language Model Inference	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Science & Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Fisher_MH_T_2025.pdf
Size:: 1.78 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses