VTechWorks staff will be away for the winter holidays until January 5, 2026, and will respond to requests at that time.
 

Analysis of Memory Access Patterns for Large Language Model Inference

dc.contributor.authorFisher, Max Henryen
dc.contributor.committeechairNikolopoulos, Dimitrios S.en
dc.contributor.committeememberBack, Godmar Volkeren
dc.contributor.committeememberLi, Huaichengen
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2025-07-10T08:00:19Zen
dc.date.available2025-07-10T08:00:19Zen
dc.date.issued2025-07-09en
dc.description.abstractThe use of tiered heterogeneous memory systems in HPC workloads is growing in popularity as the increasing memory requirements for these workloads outpace the decline in the cost- per-gigabyte of fast DRAM; however, the Linux kernel has no intelligent strategy to manage these tiered memory systems. Because of this limitation, a great deal of research has been conducted to identify policies that make efficient use of these systems. Much of this prior research focuses on deep learning tasks, while only a few focus on inference for large models. The training and inference workloads for the same type of model are quite different: in training, the task is to continuously update the weights matrices with knowledge gained from each training datum, while in inference, the workload only reads from the weights. Training for neural networks also involves accesses in reverse order to what is used in inference, in a training technique called backpropagation. This thesis presents a memory access pattern heatmap tool that can track evolving access patterns through the lifetime of a workload. This tool is applied to llama.cpp, an LLM inference tool, to identify memory access patterns between remote and local NUMA nodes. The thesis then explores two basic NUMA page placement strategies, where all memory is bound to either the local or remote NUMA nodes to identify the impact of poor NUMA policies on performance and compares them to the default Linux strategy.en
dc.description.abstractgeneralScientific computing often involves running programs with very large memory footprints that might not fit in the available memory for the system on which they run. Because expanding the available memory on a system can be expensive, tiered memory systems, which provide the illusion of a fast and large memory by storing some data in regular small, expensive, and fast memory (DRAM) and the rest of the data in large, cheap, and slow memory (NVM), have been growing in popularity. However, effective use of such systems requires an intelligent strategy for moving data between tiers of memory. Past research has explored strategies that leverage data access patterns in programs like those that train machine learning models, but few explore leveraging the access patterns of running machine learning models. This thesis explores how llama.cpp, a program that runs large language models, makes accesses to data and how those patterns change as the program runs. It also explores how differing data placement policies impact performance. It compares how quickly llama.cpp runs when all data is allocated to a slow memory node to when all data is allocated to a fast memory node, reflecting how important these strategies are to fast performance.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:43726en
dc.identifier.urihttps://hdl.handle.net/10919/135946en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution-ShareAlike 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/en
dc.subjectNUMAen
dc.subjectPage Placementen
dc.subjectMemory Access Patternsen
dc.subjectHigh Performance Computingen
dc.subjectLLM Inferenceen
dc.titleAnalysis of Memory Access Patterns for Large Language Model Inferenceen
dc.typeThesisen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Fisher_MH_T_2025.pdf
Size:
1.78 MB
Format:
Adobe Portable Document Format

Collections