CRIU-RTX: Remote Thread eXecution using Checkpoint/Restore in Userspace

dc.contributor.authorNoor Mohamed, Mohamed Husainen
dc.contributor.committeechairRavindran, Binoyen
dc.contributor.committeememberGiles, Kendall Everetten
dc.contributor.committeememberWang, Xiaoguangen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2023-07-22T08:00:26Zen
dc.date.available2023-07-22T08:00:26Zen
dc.date.issued2023-07-21en
dc.description.abstractScaling up application performance on single high-end machines is increasingly becoming difficult due to scalability challenges of processor interconnects, cache coherence protocols, and memory bandwidth. Significant prior work has addressed this problem by scaling-out application threads across multiple nodes to exploit resources outside the single machine boundary. Prior works have also leveraged heterogeneous instruction set architecture (ISA) systems to improve application performance as well as energy-efficiency, a major cost driver in datacenters, by augmenting high-end servers with power-efficient embedded boards. Existing works, however, suffer from deployability challenges due to dependencies on the operating system or programming models that require non-trivial application modifications. We introduce CRIU-RTX, a userspace framework to scale-out multi-threaded applications across multiple nodes. Integrated with HetMigrate, a prior work on migrating processes across heterogeneous-ISA systems, CRIU-RTX can suspend a subset of threads in a process and resume their execution on different nodes, including, but not limited to heterogeneous-ISA nodes. CRIU-RTX implements distributed shared memory in userspace, thereby allowing application threads to access distributed memory transparently without any operating system dependency. Our experimental evaluations show 21% to 43% performance gains while scaling-out applications across x86-64 servers, and energy efficiency gains of up to 18% while scaling-out across a cluster of x86-64 server and ARM64 embedded boards. Since CRIU-RTX does not depend on operating system modifications, it can be easily deployed on a diverse set of machines, including, but not limited to ISA-different machines running the stock Linux operating system.en
dc.description.abstractgeneralCommonly referred to as "Moore's Law", Gordan Moore predicted that the number of transistors on a chip would double every two years. However, this law no longer holds true, leading to a shift in computer research and development. To meet the increasing demands for faster and cheaper servers, researchers began exploring alternative computer designs. Data centers have started adopting servers with diverse architectures to enhance the cost-to-performance ratio, resulting in heterogeneous environments. Distributed execution refers to the process of running computational tasks or executing software across multiple interconnected systems or nodes. Instead of relying on a single machine or processor, the workload is distributed among a network of computers, allowing for parallel processing and improved performance. Prior works in this direction had difficulty in adoption due to customized hardware or operating system requirements. This thesis introduces CRIU-RTX, a userspace framework to scale-out application threads without operating system dependency. We implemented a distributed shared memory system in userspace to allow application threads running in scaled-out execution to access distributed memory as if they are running on the same machine. Our evaluations of CRIU-RTX show significant improvement in performance and energy-efficiency.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:38222en
dc.identifier.urihttp://hdl.handle.net/10919/115819en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectHeterogeneous Systemsen
dc.subjectDistributed Executionen
dc.subjectEnergy Efficiencyen
dc.titleCRIU-RTX: Remote Thread eXecution using Checkpoint/Restore in Userspaceen
dc.typeThesisen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Name:
Noor_Mohamed_M_T_2023.pdf
Size:
1.68 MB
Format:
Adobe Portable Document Format

Collections