10C<small>ACHE</small>: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training

Afroz, Sabiha; Khan, Redwan Ibne Seraj; Albahar, Hadeel; Han, Jingoo; Butt, Ali R.

10CACHE: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training

dc.contributor.author	Afroz, Sabiha	en
dc.contributor.author	Khan, Redwan Ibne Seraj	en
dc.contributor.author	Albahar, Hadeel	en
dc.contributor.author	Han, Jingoo	en
dc.contributor.author	Butt, Ali R.	en
dc.date.accessioned	2026-02-03T13:54:48Z	en
dc.date.available	2026-02-03T13:54:48Z	en
dc.date.issued	2025-11-19	en
dc.date.updated	2026-02-01T08:45:41Z	en
dc.description.abstract	Training large language models (LLMs) in the cloud faces growing memory bottlenecks due to the limited capacity and high cost of GPUs. While GPU memory offloading to CPU and NVMe has made large-scale training more feasible, existing approaches suffer from high tensor migration latency and suboptimal device memory utilization, ultimately increasing training time and cloud costs. To address these challenges, we present 10Cache, a resource-aware tensor caching and migration system that accelerates LLM training by intelligently coordinating memory usage across GPU, CPU, and NVMe tiers. 10Cache profiles tensor execution order to construct prefetch policies, allocates memory buffers in pinned memory based on tensor size distributions, and reuses memory buffers to minimize allocation overhead. Designed for cloud-scale deployments, 10Cache improves memory efficiency and reduces reliance on high-end GPUs. Across diverse LLM workloads, it achieves up to 2× speedup in training time, improves GPU cache hit rate by up to 86.6×, and increases CPU/GPU memory utilization by up to 2.15× and 1.33×, respectively, compared to state-of-the-art offloading methods. These results demonstrate that 10Cache is a practical and scalable solution for optimizing LLM training throughput and resource efficiency in cloud environments.	en
dc.description.version	Published version	en
dc.format.mimetype	application/pdf	en
dc.identifier.doi	https://doi.org/10.1145/3772052.3772236	en
dc.identifier.uri	https://hdl.handle.net/10919/141118	en
dc.language.iso	en	en
dc.publisher	ACM	en
dc.rights	Creative Commons Attribution-ShareAlike 4.0 International	en
dc.rights.holder	The author(s)	en
dc.rights.uri	http://creativecommons.org/licenses/by-sa/4.0/	en
dc.title	10C<small>ACHE</small>: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training	en
dc.type	Article - Refereed	en
dc.type.dcmitype	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 3772052.3772236.pdf
Size:: 1.2 MB
Format:: Adobe Portable Document Format
Description:: Published version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Journal Articles, Association for Computing Machinery (ACM)
Scholarly Works, Computer Science