Revitalizing the Forgotten On-Chip DMA to Expedite Data Movement in NVM-based Storage Systems

Su, Jingbo; Li, Jiahao; Chen, Luofan; Li, Cheng; Zhang, Kai; Yang, Liang; Noh, Sam H.; Xu, Yinlong

Revitalizing the Forgotten On-Chip DMA to Expedite Data Movement in NVM-based Storage Systems

dc.contributor.author	Su, Jingbo	en
dc.contributor.author	Li, Jiahao	en
dc.contributor.author	Chen, Luofan	en
dc.contributor.author	Li, Cheng	en
dc.contributor.author	Zhang, Kai	en
dc.contributor.author	Yang, Liang	en
dc.contributor.author	Noh, Sam H.	en
dc.contributor.author	Xu, Yinlong	en
dc.date.accessioned	2024-02-19T20:08:36Z	en
dc.date.available	2024-02-19T20:08:36Z	en
dc.date.issued	2023	en
dc.description.abstract	Data-intensive applications executing on NVM-based storage systems experience serious bottlenecks when moving data between DRAM and NVM. We advocate for the use of the long-existing but recently neglected on-chip DMA to expedite data movement with three contributions. First, we explore new latency-oriented optimization directions, driven by a comprehensive DMA study, to design a high-performance DMA module, which significantly lowers the I/O size threshold to observe benefits. Second, we propose a new data movement engine, Fastmove, that coordinates the use of the DMA along with the CPU with judicious scheduling and load splitting such that the DMA’s limitations are compensated, and the overall gains are maximized. Finally, with a general kernel-based design, simple APIs, and DAX file system integration, Fastmove allows applications to transparently exploit the DMA and its new features without code change. We run three data-intensive applications MySQL, GraphWalker, and Filebench atop NOVA, ext4-DAX, and XFS-DAX, with standard benchmarks like TPC-C, and popular graph algorithms like PageRank. Across single- and multi-socket settings, compared to the conventional CPU-only NVM accesses, Fastmove introduces to TPC-C with MySQL 1.13-2.16× speedups of peak throughput, reduces the average latency by 17.7-60.8%, and saves 37.1-68.9% CPU usage spent in data movement. It also shortens the execution time of graph algorithms with GraphWalker by 39.7-53.4%, and introduces 1.12-1.27× throughput speedups for Filebench.	en
dc.description.version	Published version	en
dc.format.extent	Pages 363-378	en
dc.format.extent	16 page(s)	en
dc.format.mimetype	application/pdf	en
dc.identifier.isbn	9781939133328	en
dc.identifier.orcid	Noh, Sam Hyuk [0000-0002-9152-0321]	en
dc.identifier.uri	https://hdl.handle.net/10919/118053	en
dc.language.iso	en	en
dc.publisher	Usenix Association	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.title	Revitalizing the Forgotten On-Chip DMA to Expedite Data Movement in NVM-based Storage Systems	en
dc.title.serial	PROCEEDINGS of THE 21ST USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, FAST 2023	en
dc.type	Conference proceeding	en
dc.type.dcmitype	Text	en
dc.type.other	Proceedings Paper	en
dc.type.other	Book	en
pubs.finish-date	2023-02-23	en
pubs.organisational-group	/Virginia Tech	en
pubs.organisational-group	/Virginia Tech/Engineering	en
pubs.organisational-group	/Virginia Tech/Engineering/Computer Science	en
pubs.organisational-group	/Virginia Tech/All T&R Faculty	en
pubs.organisational-group	/Virginia Tech/Engineering/COE T&R Faculty	en
pubs.start-date	2023-02-21	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: fast23-su.pdf
Size:: 955 KB
Format:: Adobe Portable Document Format
Description:: Published version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Plain Text
Description:

Download

Collections

All Faculty Deposits
Scholarly Works, Computer Science