VTechWorks staff will be away for the winter holidays starting Tuesday, December 24, 2024, through Wednesday, January 1, 2025, and will not be replying to requests during this time. Thank you for your patience, and happy holidays!
 

f-DSM: An FPGA-Accelerated Distributed Shared Memory for Heterogeneous Instruction-Set-Architecture Hardware

TR Number

Date

2022-03-03

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Due to the diminishing relevance of Moore's Law, traditional multi-core systems are increasingly struggling to meet the computational demands of many emerging workloads. Heterogeneous computing, which involves exploiting higher degrees of parallelism (e.g., GPUs) and application-specific specialization (e.g., FPGAs), is increasingly used to meet this demand. An important architectural trend in this space involves using instruction-set-architecture (ISA) heterogeneity. An exemplar case is emerging I/O devices that include CPU cores with ISAs (e.g., ARM, RISC-V) that differ from that of host CPUs (e.g., x86) and have physically discrete memory. Shared-memory programming of such systems requires the Dis- tributed Shared Memory (DSM) abstraction. Software DSM incurs significant OS overhead for maintaining memory coherency. Despite outperforming software predecessors, hardware DSM and cache-coherent interfaces require custom chips and lack the flexibility to experiment with different DSM consistency protocols. This thesis presents fDSM, an FPGA-accelerated DSM framework for ISA-heterogeneous hardware. fDSM implements a high-speed messaging layer to enable inter-node communication across ISA-different CPU cores and a DSM protocol processor that maintains virtual memory coherency using a multiple-reader single- writer DSM algorithm. Experimental studies reveal that fDSM outperforms prior art, including Popcorn Linux's software DSM abstraction, which uses TCP-IP and state-of-the-art Infiniband RDMA messaging layers by 2.8X and 7%, respectively. fDSM also provides reconfigurability and thereby allows implementation and experimentation of different memory consistency models.

Description

Keywords

Distributed Shared Memory, Sequential Consistency, Popcorn Linux, Field programmable gate arrays, Instruction-Set-Architecture

Citation

Collections