Optimizing Systems for Deep Learning Applications

TR Number

Date

2023-03-01

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Modern systems for Machine Learning (ML) workloads support heterogeneous workloads and resources. However, existing resource managers in these systems do not differentiate between heterogeneous GPU resources. Moreover, users are usually unaware of the sufficient and appropriate type and amount of GPU resources to request for their ML jobs. In this thesis, we analyze the performance of ML training and inference jobs and identify ML model and GPU characteristics that impact this performance. We then propose ML-based prediction models to accurately determine appropriate and sufficient resource requirements to ensure improved job latency and GPU utilization in the cluster.

Description

Keywords

GPU heterogeneity, Deep Learning and Inference, Kubernetes, GPU sharing, Resource requirement prediction

Citation