Towards SLO-aware Resource Scheduling for Serverless Inference Workloads

Tripathy, Abhijit

Towards SLO-aware Resource Scheduling for Serverless Inference Workloads

Files

Tripathy_A_T_2023.pdf (674.45 KB)

Downloads: 1227

Date

2023-08-08

Authors

Tripathy, Abhijit

Publisher

Virginia Tech

Abstract

The rapid advancement of Machine Learning (ML) and Deep Learning (DL) has revolutionized various domains, necessitating efficient and cost-effective ML inference capabilities. Function-as-a-Service (FaaS) has emerged as a promising approach for hosting ML inference services, providing a serverless computing environment that streamlines development cycles and offers scalability and simplified infrastructure management. However, existing autoscaling strategies employed by popular FaaS platforms often overlook critical factors such as response time and tail latency. Additionally, Python's Global Interpreter Lock (GIL) poses challenges for parallel computing in high-request traffic scenarios. This thesis addresses the need for efficient and cost-effective Machine Learning (ML) inference capabilities by exploring batching and autoscaling strategies for Serverless Inference instances. The study proposes a prototype FaaS framework that provides adaptive request batching, reactive autoscaling policies, and SLO monitoring, thus allowing Serverless Inference workloads to meet their SLO targets even during peak traffic. The proposed approach aims to optimize resource utilization, mitigate tail latency, and improve overall system performance.

Keywords

Machine Learning, Deep Learning, Serverless Inference, Autoscaling, Load Balancing, Response Time, Tail Latency

Persistent link

http://hdl.handle.net/10919/116005

Collections

Masters Theses

Full item page

Towards SLO-aware Resource Scheduling for Serverless Inference Workloads

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections