Show simple item record

dc.contributor.authorHelal, Ahmed E.en_US
dc.contributor.authorFeng, Wu-chunen_US
dc.contributor.authorJung, Changheeen_US
dc.contributor.authorHanafy, Yasser Y.en_US
dc.date.accessioned2016-12-14T16:20:14Z
dc.date.available2016-12-14T16:20:14Z
dc.date.issued2016-12-13
dc.identifier.urihttp://hdl.handle.net/10919/73693
dc.description.abstractAbstract—HPC systems contain a wide variety of heterogeneous computing resources, ranging from general-purpose CPUs to specialized accelerators. Porting sequential applications to such systems for achieving high performance requires significant software and hardware expertise as well as extensive manual analysis of both the target architectures and applications to decide the best performing architecture and implementation technique for each application. To streamline this tedious process, this paper presents AutoMatch, a tool for automated matching of compute kernels to heterogeneous HPC architectures. AutoMatch analyzes the sequential application code and automatically predicts the performance of the best parallel implementation of its compute kernels on different hardware architectures. AutoMatch leverages such prediction results to identify the best device for each kernel from a set of devices including multi-core CPUs and many-core GPUs. In addition, it estimates the relative execution cost between the different architectures to drive a workload distribution scheme, which enables end users to efficiently exploit the available compute resources across multiple heterogeneous architectures. We demonstrate the efficacy of AutoMatch, using a set of open-source HPC applications and benchmarks with different parallelism profiles and memory-access patterns. The empirical evaluation shows that AutoMatch is highly accurate across five different heterogeneous architectures, identifying the best architecture for each workload in 96% of the test cases, and its workload distribution scheme has a comparable performance to a profiling-driven oracle.en_US
dc.language.isoen_USen_US
dc.publisherDepartment of Computer Science, Virginia Polytechnic Institute & State Universityen_US
dc.relation.ispartofComputer Science Technical Reportsen_US
dc.subjectArchitectureen_US
dc.subjectComputer Systemsen_US
dc.subjectHigh-Performance Computingen_US
dc.subjectParallel and Distributed Computingen_US
dc.titleAutoMatch: Automated Matching of Compute Kernels to Heterogeneous HPC Architecturesen_US
dc.typeTechnical reporten_US
dc.identifier.trnumberTR-16-06en_US
dc.type.dcmitypeTexten_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record