VTechWorks staff will be away for the winter holidays starting Tuesday, December 24, 2024, through Wednesday, January 1, 2025, and will not be replying to requests during this time. Thank you for your patience, and happy holidays!
 

f-DSM: An FPGA-Accelerated Distributed Shared Memory for Heterogeneous Instruction-Set-Architecture Hardware

dc.contributor.authorVSathish, Naarayanan Raoen
dc.contributor.committeechairRavindran, Binoyen
dc.contributor.committeechairHorta, Edson Lemosen
dc.contributor.committeememberPatterson, Cameron D.en
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2022-03-04T09:00:13Zen
dc.date.available2022-03-04T09:00:13Zen
dc.date.issued2022-03-03en
dc.description.abstractDue to the diminishing relevance of Moore's Law, traditional multi-core systems are increasingly struggling to meet the computational demands of many emerging workloads. Heterogeneous computing, which involves exploiting higher degrees of parallelism (e.g., GPUs) and application-specific specialization (e.g., FPGAs), is increasingly used to meet this demand. An important architectural trend in this space involves using instruction-set-architecture (ISA) heterogeneity. An exemplar case is emerging I/O devices that include CPU cores with ISAs (e.g., ARM, RISC-V) that differ from that of host CPUs (e.g., x86) and have physically discrete memory. Shared-memory programming of such systems requires the Dis- tributed Shared Memory (DSM) abstraction. Software DSM incurs significant OS overhead for maintaining memory coherency. Despite outperforming software predecessors, hardware DSM and cache-coherent interfaces require custom chips and lack the flexibility to experiment with different DSM consistency protocols. This thesis presents fDSM, an FPGA-accelerated DSM framework for ISA-heterogeneous hardware. fDSM implements a high-speed messaging layer to enable inter-node communication across ISA-different CPU cores and a DSM protocol processor that maintains virtual memory coherency using a multiple-reader single- writer DSM algorithm. Experimental studies reveal that fDSM outperforms prior art, including Popcorn Linux's software DSM abstraction, which uses TCP-IP and state-of-the-art Infiniband RDMA messaging layers by 2.8X and 7%, respectively. fDSM also provides reconfigurability and thereby allows implementation and experimentation of different memory consistency models.en
dc.description.abstractgeneralMoore's Law predicts that the number of transistors in a chip will double approximately every two years. Chip vendors are increasingly observing that this law is nearing its limit when transistor sizes are shrunk to 5nm and 3nm due to power consumption and heat dissipation issues. As a result, innovation in new computing architectures has increasingly focused on heterogeneity, i.e., the use of hardware performance accelerators like graphic processors and reconfigurable logic used in confluence with a computer's CPU (host). To improve the programmability of these architectures, which usually have physically separate memory, the shared-memory programming model is usually used to provide coherent virtual memory. The shared memory model, when applied to such distributed systems, called distributed shared memory (or DSM), has been previously developed in software as well as in hardware. The former usually suffer from high latency overheads, while the latter often requires custom chips and lack programmability for implementing new memory consistency protocols. This thesis presents fDSM, a reconfigurable distributed shared memory framework that provides coherent shared memory between a host and a smart I/O device such as a SmartNIC. fDSM is implemented in FPGAs, which are increasingly available in hosts and Smart I/O devices at the commodity scale. Our prototype implementation uses ISA-heterogeneous hosts to emulate such an environment. Our experimental evaluation using applications from High- Performance Computing benchmark suites reveal that fDSM yields performance benefits over a state-of-the-art software DSM.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:33981en
dc.identifier.urihttp://hdl.handle.net/10919/109095en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectDistributed Shared Memoryen
dc.subjectSequential Consistencyen
dc.subjectPopcorn Linuxen
dc.subjectField programmable gate arraysen
dc.subjectInstruction-Set-Architectureen
dc.titlef-DSM: An FPGA-Accelerated Distributed Shared Memory for Heterogeneous Instruction-Set-Architecture Hardwareen
dc.typeThesisen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
VSathish_N_T_2022.pdf
Size:
3.21 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
VSathish_N_T_2022_support_1.pdf
Size:
52.75 KB
Format:
Adobe Portable Document Format
Description:
Supporting documents

Collections