Analysis of Memory Access Patterns for Large Language Model Inference
| dc.contributor.author | Fisher, Max Henry | en |
| dc.contributor.committeechair | Nikolopoulos, Dimitrios S. | en |
| dc.contributor.committeemember | Back, Godmar Volker | en |
| dc.contributor.committeemember | Li, Huaicheng | en |
| dc.contributor.department | Computer Science and#38; Applications | en |
| dc.date.accessioned | 2025-07-10T08:00:19Z | en |
| dc.date.available | 2025-07-10T08:00:19Z | en |
| dc.date.issued | 2025-07-09 | en |
| dc.description.abstract | The use of tiered heterogeneous memory systems in HPC workloads is growing in popularity as the increasing memory requirements for these workloads outpace the decline in the cost- per-gigabyte of fast DRAM; however, the Linux kernel has no intelligent strategy to manage these tiered memory systems. Because of this limitation, a great deal of research has been conducted to identify policies that make efficient use of these systems. Much of this prior research focuses on deep learning tasks, while only a few focus on inference for large models. The training and inference workloads for the same type of model are quite different: in training, the task is to continuously update the weights matrices with knowledge gained from each training datum, while in inference, the workload only reads from the weights. Training for neural networks also involves accesses in reverse order to what is used in inference, in a training technique called backpropagation. This thesis presents a memory access pattern heatmap tool that can track evolving access patterns through the lifetime of a workload. This tool is applied to llama.cpp, an LLM inference tool, to identify memory access patterns between remote and local NUMA nodes. The thesis then explores two basic NUMA page placement strategies, where all memory is bound to either the local or remote NUMA nodes to identify the impact of poor NUMA policies on performance and compares them to the default Linux strategy. | en |
| dc.description.abstractgeneral | Scientific computing often involves running programs with very large memory footprints that might not fit in the available memory for the system on which they run. Because expanding the available memory on a system can be expensive, tiered memory systems, which provide the illusion of a fast and large memory by storing some data in regular small, expensive, and fast memory (DRAM) and the rest of the data in large, cheap, and slow memory (NVM), have been growing in popularity. However, effective use of such systems requires an intelligent strategy for moving data between tiers of memory. Past research has explored strategies that leverage data access patterns in programs like those that train machine learning models, but few explore leveraging the access patterns of running machine learning models. This thesis explores how llama.cpp, a program that runs large language models, makes accesses to data and how those patterns change as the program runs. It also explores how differing data placement policies impact performance. It compares how quickly llama.cpp runs when all data is allocated to a slow memory node to when all data is allocated to a fast memory node, reflecting how important these strategies are to fast performance. | en |
| dc.description.degree | Master of Science | en |
| dc.format.medium | ETD | en |
| dc.identifier.other | vt_gsexam:43726 | en |
| dc.identifier.uri | https://hdl.handle.net/10919/135946 | en |
| dc.language.iso | en | en |
| dc.publisher | Virginia Tech | en |
| dc.rights | Creative Commons Attribution-ShareAlike 4.0 International | en |
| dc.rights.uri | http://creativecommons.org/licenses/by-sa/4.0/ | en |
| dc.subject | NUMA | en |
| dc.subject | Page Placement | en |
| dc.subject | Memory Access Patterns | en |
| dc.subject | High Performance Computing | en |
| dc.subject | LLM Inference | en |
| dc.title | Analysis of Memory Access Patterns for Large Language Model Inference | en |
| dc.type | Thesis | en |
| thesis.degree.discipline | Computer Science & Applications | en |
| thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
| thesis.degree.level | masters | en |
| thesis.degree.name | Master of Science | en |
Files
Original bundle
1 - 1 of 1