Albahar, Hadeel Ahmad2023-03-022023-03-022023-03-01vt_gsexam:36650http://hdl.handle.net/10919/114021Modern systems for Machine Learning (ML) workloads support heterogeneous workloads and resources. However, existing resource managers in these systems do not differentiate between heterogeneous GPU resources. Moreover, users are usually unaware of the sufficient and appropriate type and amount of GPU resources to request for their ML jobs. In this thesis, we analyze the performance of ML training and inference jobs and identify ML model and GPU characteristics that impact this performance. We then propose ML-based prediction models to accurately determine appropriate and sufficient resource requirements to ensure improved job latency and GPU utilization in the cluster.ETDenIn CopyrightGPU heterogeneityDeep Learning and InferenceKubernetesGPU sharingResource requirement predictionOptimizing Systems for Deep Learning ApplicationsDissertation