Towards Workload-aware Efficient Machine Learning Systems

Khan, Redwan Ibne Seraj

Towards Workload-aware Efficient Machine Learning Systems

dc.contributor.author	Khan, Redwan Ibne Seraj	en
dc.contributor.committeechair	Butt, Ali	en
dc.contributor.committeemember	Cameron, Kirk W.	en
dc.contributor.committeemember	Ji, Bo	en
dc.contributor.committeemember	Jian, Xun	en
dc.contributor.committeemember	Cheng, Yue	en
dc.contributor.department	Computer Science and#38; Applications	en
dc.date.accessioned	2025-03-04T09:00:13Z	en
dc.date.available	2025-03-04T09:00:13Z	en
dc.date.issued	2025-03-03	en
dc.description.abstract	Machine learning (ML) is transforming various aspects of our lives, driving the need for computing systems that efficiently support large-scale ML workloads. As models grow in size and complexity, existing systems struggle to adapt, limiting both performance and flexibility. Additionally, ML techniques can enhance traditional computing tasks, but current systems lack the adaptability to integrate these advancements effectively. Building systems for running machine learning workloads, and running workloads using machine learning - both require a careful understanding of the nature of the systems and ML models. In this dissertation we design and develop a series of novel storage and scheduling solutions for ML systems by bringing attention to the unique characteristics of workloads and the underlying system. We find that by designing ML systems that are finely tuned to workload characteristics and underlying infrastructure, we can significantly enhance application performance and maximize resource utilization. In the first part of this dissertation (Ch- 3), we analyze popular ML models and datasets, uncovering insights that inspired SHADE, a data-importance-aware caching solution for ML. The second part of this dissertation (Ch- 4) proposes to leverage system characteristics of hundreds of client devices along with the characteristics of the samples within the clients to design novel sampling, caching and client scheduling mechanisms to tackle the data and system heterogeneity among client devices and thereby fundamentally improve the performance of federated learning using edge devices in the cloud. The third part of this dissertation (Ch- 5) proposes to leverage multi-agent LLM application and user request characteristics to design an efficient request scheduling mechanism that can serve clients in multi-tenant environments in a fair and efficient manner while preventing abuse. My dissertation demonstrates that leveraging workload-aware strategies can significantly enhance the efficiency (e.g., reduced training time, increased throughput, lower latency) and flexibility (e.g., improved ease of use, deployment, and programmability) of ma- chine learning systems. By accounting for workload dynamicity and heterogeneity, these principles can guide the design of next-generation ML systems, ensuring adaptability to emerging models and evolving hardware technologies.	en
dc.description.abstractgeneral	Machine learning (ML) has become an integral part of our daily lives, powering applications from virtual assistants to medical diagnostics. As ML models grow larger and more complex, the systems that run them must evolve to keep pace. This dissertation explores how we can build more efficient and adaptable computing systems to support large-scale ML workloads. Traditional computing systems often struggle to accommodate the ever-changing demands of ML applications. Similarly, ML techniques can be leveraged to improve the performance of non-ML workloads, but existing systems lack the flexibility to integrate these advancements seamlessly. This research tackles both challenges: designing systems optimized for ML workloads and enhancing traditional systems using ML-driven insights. By designing intelligent, workload-aware strategies, this research demonstrates substantial improvements in the speed, efficiency, and flexibility of ML systems. These principles will help shape the next generation of computing infrastructure, ensuring that future ML models and applications can be deployed smoothly, regardless of scale or complexity.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:42574	en
dc.identifier.uri	https://hdl.handle.net/10919/124762	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Machine Learning	en
dc.subject	Deep Learning	en
dc.subject	Federated Learning	en
dc.subject	High Performance Computing	en
dc.subject	Cloud Computing	en
dc.subject	Storage Systems	en
dc.subject	Data Storage	en
dc.subject	Data Management	en
dc.subject	Machine Learning Systems	en
dc.subject	Job Scheduling	en
dc.subject	Resource Management	en
dc.subject	MLSys	en
dc.subject	SysML	en
dc.subject	Efficiency	en
dc.subject	Flexibility	en
dc.title	Towards Workload-aware Efficient Machine Learning Systems	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science & Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Khan_R_D_2025.pdf
Size:: 2.13 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations