Automated Runtime Analysis and Adaptation for Scalable Heterogeneous Computing

dc.contributor.authorHelal, Ahmed Elmohamadi Mohameden
dc.contributor.committeechairFeng, Wu-chunen
dc.contributor.committeememberNazhandali, Leylaen
dc.contributor.committeememberJung, Changheeen
dc.contributor.committeememberHanafy, Yasser Y.en
dc.contributor.committeememberMin, Chang Wooen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2020-01-30T09:00:45Zen
dc.date.available2020-01-30T09:00:45Zen
dc.date.issued2020-01-29en
dc.description.abstractIn the last decade, there have been tectonic shifts in computer hardware because of reaching the physical limits of the sequential CPU performance. As a consequence, current high-performance computing (HPC) systems integrate a wide variety of compute resources with different capabilities and execution models, ranging from multi-core CPUs to many-core accelerators. While such heterogeneous systems can enable dramatic acceleration of user applications, extracting optimal performance via manual analysis and optimization is a complicated and time-consuming process. This dissertation presents graph-structured program representations to reason about the performance bottlenecks on modern HPC systems and to guide novel automation frameworks for performance analysis and modeling and runtime adaptation. The proposed program representations exploit domain knowledge and capture the inherent computation and communication patterns in user applications, at multiple levels of computational granularity, via compiler analysis and dynamic instrumentation. The empirical results demonstrate that the introduced modeling frameworks accurately estimate the realizable parallel performance and scalability of a given sequential code when ported to heterogeneous HPC systems. As a result, these frameworks enable efficient workload distribution schemes that utilize all the available compute resources in a performance-proportional way. In addition, the proposed runtime adaptation frameworks significantly improve the end-to-end performance of important real-world applications which suffer from limited parallelism and fine-grained data dependencies. Specifically, compared to the state-of-the-art methods, such an adaptive parallel execution achieves up to an order-of-magnitude speedup on the target HPC systems while preserving the inherent data dependencies of user applications.en
dc.description.abstractgeneralCurrent supercomputers integrate a massive number of heterogeneous compute units with varying speed, computational throughput, memory bandwidth, and memory access latency. This trend represents a major challenge to end users, as their applications have been designed from the ground up to primarily exploit homogeneous CPUs. While heterogeneous systems can deliver several orders of magnitude speedup compared to traditional CPU-based systems, end users need extensive software and hardware expertise as well as significant time and effort to efficiently utilize all the available compute resources. To streamline such a daunting process, this dissertation presents automated frameworks for analyzing and modeling the performance on parallel architectures and for transforming the execution of user applications at runtime. The proposed frameworks incorporate domain knowledge and adapt to the input data and the underlying hardware using novel static and dynamic analyses. The experimental results show the efficacy of the introduced frameworks across many important application domains, such as computational fluid dynamics (CFD), and computer-aided design (CAD). In particular, the adaptive execution approach on heterogeneous systems achieves up to an order-of-magnitude speedup over the optimized parallel implementations.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:23625en
dc.identifier.urihttp://hdl.handle.net/10919/96607en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectParallel Architecturesen
dc.subjectAcceleratorsen
dc.subjectHeterogeneous Computingen
dc.subjectPerformance Modelingen
dc.subjectRuntime Adaptationen
dc.subjectSchedulingen
dc.subjectPerformance Portabilityen
dc.subjectMPIen
dc.subjectGPUen
dc.subjectLLVMen
dc.titleAutomated Runtime Analysis and Adaptation for Scalable Heterogeneous Computingen
dc.typeDissertationen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Helal_AE_D_2020.pdf
Size:
6.24 MB
Format:
Adobe Portable Document Format