Optimizing Systems for Deep Learning Applications

Albahar, Hadeel Ahmad

Optimizing Systems for Deep Learning Applications

Files

Albahar_HA_D_2023.pdf (1.85 MB)

Downloads: 46

Date

2023-03-01

Authors

Albahar, Hadeel Ahmad

Publisher

Virginia Tech

Abstract

Modern systems for Machine Learning (ML) workloads support heterogeneous workloads and resources. However, existing resource managers in these systems do not differentiate between heterogeneous GPU resources. Moreover, users are usually unaware of the sufficient and appropriate type and amount of GPU resources to request for their ML jobs. In this thesis, we analyze the performance of ML training and inference jobs and identify ML model and GPU characteristics that impact this performance. We then propose ML-based prediction models to accurately determine appropriate and sufficient resource requirements to ensure improved job latency and GPU utilization in the cluster.

Keywords

GPU heterogeneity, Deep Learning and Inference, Kubernetes, GPU sharing, Resource requirement prediction

Persistent link

http://hdl.handle.net/10919/114021

Collections

Doctoral Dissertations

Full item page

Optimizing Systems for Deep Learning Applications

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections