Towards SLO-aware Resource Scheduling for Serverless Inference Workloads

dc.contributor.authorTripathy, Abhijiten
dc.contributor.committeechairButt, Alien
dc.contributor.committeememberRafique, Muhammad Mustafaen
dc.contributor.committeememberNikolopoulos, Dimitrios S.en
dc.contributor.departmentComputer Science and Applicationsen
dc.date.accessioned2023-08-09T08:00:21Zen
dc.date.available2023-08-09T08:00:21Zen
dc.date.issued2023-08-08en
dc.description.abstractThe rapid advancement of Machine Learning (ML) and Deep Learning (DL) has revolutionized various domains, necessitating efficient and cost-effective ML inference capabilities. Function-as-a-Service (FaaS) has emerged as a promising approach for hosting ML inference services, providing a serverless computing environment that streamlines development cycles and offers scalability and simplified infrastructure management. However, existing autoscaling strategies employed by popular FaaS platforms often overlook critical factors such as response time and tail latency. Additionally, Python's Global Interpreter Lock (GIL) poses challenges for parallel computing in high-request traffic scenarios. This thesis addresses the need for efficient and cost-effective Machine Learning (ML) inference capabilities by exploring batching and autoscaling strategies for Serverless Inference instances. The study proposes a prototype FaaS framework that provides adaptive request batching, reactive autoscaling policies, and SLO monitoring, thus allowing Serverless Inference workloads to meet their SLO targets even during peak traffic. The proposed approach aims to optimize resource utilization, mitigate tail latency, and improve overall system performance.en
dc.description.abstractgeneralMachine Learning (ML) and Deep Learning (DL) are advanced techniques that allow computers to learn from data and make predictions or decisions without being explicitly programmed. This has led to significant advancements in various fields. Inference refers to the process of applying a trained ML model to new data to make predictions or extract insights. In the context of ML, there is a growing need for efficient and cost-effective inference capabilities. A new approach called Function-as-a-Service (FaaS) has emerged that can address this need. FaaS is a way of abstracting the server infrastructure away from the developers. This means developers can focus on writing the ML code without worrying about managing the underlying infrastructure. FaaS offers benefits such as scalability, simplified infrastructure management, and faster development cycles. However, existing FaaS platforms face challenges in ensuring fast response times and handling high levels of incoming requests. This thesis aims to address these challenges by proposing a prototype FaaS framework. The framework incorporates adaptive request batching, reactive autoscaling policies, and Service-Level Objectives (SLOs) monitoring. Request batching allows the framework to process multiple requests together, improving efficiency. Autoscaling policies ensure the system dynamically adjusts its resources based on the incoming workload. Monitoring SLOs helps track and meet performance targets, even during peak traffic. By optimizing resource utilization, reducing delays in processing requests, and improving overall system performance, the proposed approach seeks to provide efficient and cost-effective ML inference capabilities in a serverless environment.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:38236en
dc.identifier.urihttp://hdl.handle.net/10919/116005en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectMachine Learningen
dc.subjectDeep Learningen
dc.subjectServerless Inferenceen
dc.subjectAutoscalingen
dc.subjectLoad Balancingen
dc.subjectResponse Timeen
dc.subjectTail Latencyen
dc.titleTowards SLO-aware Resource Scheduling for Serverless Inference Workloadsen
dc.typeThesisen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Tripathy_A_T_2023.pdf
Size:
674.45 KB
Format:
Adobe Portable Document Format

Collections