Optimizing Systems for Deep Learning Applications
Files
TR Number
Date
2023-03-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract
Modern systems for Machine Learning (ML) workloads support heterogeneous workloads and resources. However, existing resource managers in these systems do not differentiate between heterogeneous GPU resources. Moreover, users are usually unaware of the sufficient and appropriate type and amount of GPU resources to request for their ML jobs. In this thesis, we analyze the performance of ML training and inference jobs and identify ML model and GPU characteristics that impact this performance. We then propose ML-based prediction models to accurately determine appropriate and sufficient resource requirements to ensure improved job latency and GPU utilization in the cluster.
Description
Keywords
GPU heterogeneity, Deep Learning and Inference, Kubernetes, GPU sharing, Resource requirement prediction